iginindex: Gini index for infinite populations and different estimation methods.

Description

Estimates the Gini index in infinite populations, using different methods.

Usage

iginindex(
  y,
  method = 5L,
  bias.correction = TRUE,
  cum.sums = NULL,
  na.rm = TRUE,
  useRcpp = TRUE
)

Value

A single numeric value between 0 and 1 containing the estimation of the Gini index based on the vector y or the vector cum.sums.

Arguments

y: A vector with the non-negative real numbers to be used for estimating the Gini index. This argument can be missing if argument cum.sums is provided.
method: An integer between 1 and 10 selecting one of the 10 methods detailed below for estimating the Gini index in infinite populations. The default method is method = 5L.
bias.correction: A `TRUE/FALSE` logical value indicating whether the bias correction should be applied to the estimation of the Gini index. The default value is bias.correction = TRUE.
cum.sums: A vector with the non-negative real numbers specifying the cumulative sums of the variable used to estimate the Gini index. This argument can be NULL if argument y is provided. The default value is cum.sums = NULL.
na.rm: A `TRUE/FALSE` logical value indicating whether NA's should be removed before the computation proceeds. The default value is na.rm = TRUE.
useRcpp: A `TRUE/FALSE` logical value indicating whether Rcpp (useRcpp = TRUE) or R (useRcpp = FALSE) is used for computation. The default value is UseRcpp = TRUE.

Author

Juan F Munoz jfmunoz@ugr.es

Jose M Pavia pavia@uv.es

Encarnacion Alvarez encarniav@ugr.es

Details

For a sample $S$, with size $n$, derived from an infinite population, different formulations of the Gini index have been proposed in the literature, but they only provide two different outputs.

This function estimates the Gini index using the various formulations, and both R and C++ codes are implemented. This can be useful for research purposes, and speed comparisons can be made. The argument cum.sums does not require that the cumulative sums are based on the non-decreasing order of the variable y.

The different methods for estimating the Gini index are (see Wang et al., 2016; Giorgi and Gigliarano, 2017; Mukhopadhyay and Sengupta, 2021; Muñoz et al., 2023):

method = 1 $$\widehat{G}_1 = \displaystyle \frac{1}{2\overline{y}n^{2}}\sum_{i \in S}\sum_{j\in S} |y_i-y_j|;$$ $$\widehat{G}_{1}^{bc} = \displaystyle \frac{1}{2\overline{y}n(n-1)}\sum_{i \in S} \sum_{j \in S} |y_i-y_j|,$$ where $\overline{y} = n^{-1}\sum_{i \in S}y_i$ is the sample mean and the label $bc$ indicates that the bias correction is applied to the estimation of the Gini index.

method = 2 $$\widehat{G}_{2} = \displaystyle \frac{n-1}{n}\frac{\sum_{i=1}^{n-1}(p_i-q_i)}{\sum_{i=1}^{n-1}pi};$$ $$\widehat{G}_{2}^{bc} = \displaystyle \frac{\sum_{i=1}^{n-1}(p_i-q_i)}{\sum_{i=1}^{n-1}pi},$$ where $$ p_i= \displaystyle \frac{i}{n}; \quad q_i= \frac{y_{i}^{+}}{y_{n}^{+}}, $$ and $y_{i}^{+}=\sum_{j=1}^{i}y_{(j)}$, with $i=\{1,\ldots,n\}$, are the cumulative sums of the ordered values $y_{(i)}$ (in non-decreasing order) of the variable of interest $y$.

method = 3 $$\widehat{G}_{3} = \displaystyle \frac{n-1}{n} - \frac{2}{n}\sum_{i=1}^{n-1}q_i; $$ $$\widehat{G}_{3}^{bc} = 1 - \displaystyle \frac{2}{n-1}\sum_{i=1}^{n-1}q_i. $$

method = 4 $$\widehat{G}_{4} = 1 - \displaystyle \sum_{i=0}^{n-1}(q_{i+1} + q_i)(p_{i+1} - p_i); $$ $$\widehat{G}_{4}^{bc} = \displaystyle \frac{n}{n-1}\left[1 - \sum_{i=0}^{n-1}(q_{i+1} + q_i)(p_{i+1} - p_i)\right], $$ where $p_0=q_0=0.$

method = 5 $$\widehat{G}_{5} = \displaystyle \frac{2}{\overline{y}n^{2}}\sum_{i \in S}iy_{(i)} - \frac{n+1}{n}; $$ $$\widehat{G}_{5}^{bc} = \displaystyle \frac{2}{\overline{y}n(n-1)}\sum_{i \in S}iy_{(i)} - \frac{n+1}{n-1}. $$

method = 6 $$\widehat{G}_{6} = \displaystyle \frac{2}{\overline{y}n}cov(i,y_{(i)}); $$ $$\widehat{G}_{6}^{bc} = \displaystyle \frac{2}{\overline{y}(n-1)}cov(i,y_{(i)}). $$

method = 7 $$\widehat{G}_{7} = \displaystyle \frac{1}{\overline{y}n^2}\sum_{i \in S}\sum_{j\in S}|y_i-y_j|\cdot |\widehat{F}_{n}^{\ast}(y_{i})-\widehat{F}_{n}^{\ast}(y_{j})|; $$ $$\widehat{G}_{7}^{bc} = \displaystyle \frac{1}{\overline{y}n(n-1)}\sum_{i\in S}\sum_{j \in S}|y_i-y_j|\cdot |\widehat{F}_{n}^{\ast}(y_{i})-\widehat{F}_{n}^{\ast}(y_{j})|, $$ where $$\widehat{F}_{n}^{\ast}(t)= \displaystyle \frac{1}{n}\sum_{i \in S}[\delta(y_i < t) + 0.5\delta(y_i = t)]$$ is the smooth (mid-point) distribution function.

method = 8 $$\widehat{G}_{8} = 1 - \displaystyle \frac{1}{\overline{y}n^2}\sum_{i \in S}\sum_{j \in S}min(y_i,y_j); $$ $$\widehat{G}_{8}^{bc} = 1 - \displaystyle \frac{1}{\overline{y}n(n-1)}\sum_{i \in S}\sum_{\substack{j \in S\\ j\neq i} }min(y_i,y_j).$$

method = 9 $$\widehat{G}_{9} = \displaystyle \frac{2}{\overline{y}n}\sum_{i \in S}y_{i}\widehat{F}_{n}^{\ast}(y_{i}) - 1;$$ $$\widehat{G}_{9}^{bc} = \displaystyle \frac{2}{\overline{y}(n-1)}\sum_{i \in S}y_{i}\widehat{F}_{n}^{\ast}(y_{i}) - \frac{n}{n-1}.$$

method = 10 $$\widehat{G}_{10} = \displaystyle \frac{n-1}{2\overline{y}n}\binom{n}{2}^{-1}\sum_{i \leq i_{1} < i_{2} \leq n}|y_{i_{1}}-y_{i_{2}}|;$$ $$\widehat{G}_{10}^{bc} = \displaystyle \frac{1}{2\overline{y}}\binom{n}{2}^{-1}\sum_{i \leq i_{1} < i_{2} \leq n}|y_{i_{1}}-y_{i_{2}}|.$$

References

Giorgi, G. M., and Gigliarano, C. (2017). The Gini concentration index: a review of the inference literature. Journal of Economic Surveys, 31(4), 1130-1148.

Mukhopadhyay, N., and Sengupta, P. P. (Eds.). (2021). Gini inequality index: Methods and applications. CRC press.

Muñoz, J. F., Moya-Fernández, P. J., and Álvarez-Verdejo, E. (2023). Exploring and Correcting the Bias in the Estimation of the Gini Measure of Inequality. Sociological Methods & Research. https://doi.org/10.1177/00491241231176847

Wang, D., Zhao, Y., and Gilmore, D. W. (2016). Jackknife empirical likelihood confidence interval for the Gini index. Statistics & Probability Letters, 110, 289-295.

Examples

Run this code

# Sample, with size 50, from a Lognormal distribution. The true Gini index is 0.5.
set.seed(123)
y <- gsample(n = 50, gini = 0.5, meanlog = 5)

# Estimation of the Gini index using the method = 5, bias correction, and Rcpp.
iginindex(y)

# Estimation of the Gini index using the method = 5, bias correction, and R.
iginindex(y, useRcpp = FALSE)

#Comparing the computation time for the various estimation methods and using R
microbenchmark::microbenchmark(
iginindex(y, method = 1,  useRcpp = FALSE),
iginindex(y, method = 2,  useRcpp = FALSE),
iginindex(y, method = 3,  useRcpp = FALSE),
iginindex(y, method = 4,  useRcpp = FALSE),
iginindex(y, method = 5,  useRcpp = FALSE),
iginindex(y, method = 6,  useRcpp = FALSE),
iginindex(y, method = 7,  useRcpp = FALSE),
iginindex(y, method = 8,  useRcpp = FALSE),
iginindex(y, method = 9,  useRcpp = FALSE),
iginindex(y, method = 10, useRcpp = FALSE)
)

# Comparing the computation time for the various estimation methods and using Rcpp
microbenchmark::microbenchmark(
iginindex(y, method = 1),
iginindex(y, method = 2),
iginindex(y, method = 3),
iginindex(y, method = 4),
iginindex(y, method = 5),
iginindex(y, method = 6),
iginindex(y, method = 7),
iginindex(y, method = 8),
iginindex(y, method = 9),
iginindex(y, method = 10) )

Run the code above in your browser using DataLab