Another subject we took in the statistics class was the Gini index.
Gini index or ratio or coefficient is used to calculate how much a certain transferable phenomenon such as income or stocks for instance, is concentrated.
For example, say you are evaluating a company and you’d like to know more about how the shares are divided among the shareholders. You could use Gini index for that!
I’ve calculated the index using R and random data you can download here. In case you’d like to know more about Gini index check here.
Here my simple R implementation of the index.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Load data | |
tb <- read.table("C:\\b.txt",header=TRUE,sep=",") | |
# Add 5 new columns for analysis purposes | |
for(i in 1:5) | |
{ | |
cbind(tb,0) | |
} | |
# Storing the number of rows and columns | |
nRow <- nrow(tb) | |
nCol <- ncol(tb) | |
# Cumulative frequencies | |
i <- 1 | |
totalF = sum(tb[,2]) | |
while(i <= nRow) | |
{ | |
if(i==1) | |
{ | |
tb[1,3] <- tb[1,2] | |
tb[1,4] <- tb[1,2]/1000 | |
}else{ | |
tb[i,3] <- tb[i-1,3]+tb[i,2] | |
tb[i,4] <- tb[i-1,3]/totalF + tb[i,2]/1000 | |
} | |
i <- i + 1 | |
} | |
i <- 1 | |
while(i<=nRow) | |
{ | |
tb[i,5] <- tb[i,1]*tb[i,2] | |
if(i==1) | |
{ | |
tb[i,6] <- tb[i,5] | |
}else{ | |
tb[i,6] <- tb[i-1,6]+tb[i,5] | |
} | |
i <- i + 1 | |
} | |
i <- 1 | |
while(i <= nRow) | |
{ | |
tb[i,7] <- tb[i,6]/sum(tb[,5]) | |
i = i +1 | |
} | |
# Show and plot the data | |
tb | |
a <- c(0,1) | |
b <- c(0,1) | |
c <- c(0,tb[,4]) | |
d <- c(0,tb[,7]) | |
plot(a,b,main="Concentration",type="l",col="green",lwd=2) | |
lines(c,d,type="b",col="red",ylab="Relative freq",xlab="Relative freq",lwd=2) | |
# Calculate Gini's R concentration index | |
getR <- function(mat) | |
{ | |
R <- 0.5 | |
area = 0.5*tb[1,4]*tb[1,7] | |
i <- 2 | |
while(i <= nRow) | |
{ | |
area = area + 0.5*(tb[i,4]-tb[i-1,4])*(tb[i-1,7]+tb[i,7]) | |
i = i + 1 | |
} | |
acmax <- (sum(tb[,2])-1)/(2*sum(tb[,2])) | |
R <- (R - area)/acmax | |
return(R) | |
} | |
# Print data | |
paste("Concentration index R is: ",getR(tb)*100,"%") |
Here below are the results
It looks like the data I used shows a 24% concentration. Cool!
No comments:
Post a Comment