For a sample \(S\), with size \(n\), derived from an infinite population, different formulations of the Gini index have been proposed in the literature, but they only provide two different outputs.
This function estimates the Gini index using the various formulations, and both R
and C++
codes are implemented. This can be useful for research purposes, and speed comparisons can be made. The argument cum.sums
does not require that the cumulative sums are based on the non-decreasing order of the variable y
.
The different methods for estimating the Gini index are (see Wang et al., 2016; Giorgi and Gigliarano, 2017; Mukhopadhyay and Sengupta, 2021; Muñoz et al., 2023):
method = 1
$$\widehat{G}_1 = \displaystyle \frac{1}{2\overline{y}n^{2}}\sum_{i \in S}\sum_{j\in S} |y_i-y_j|;$$
$$\widehat{G}_{1}^{bc} = \displaystyle \frac{1}{2\overline{y}n(n-1)}\sum_{i \in S} \sum_{j \in S} |y_i-y_j|,$$
where \(\overline{y} = n^{-1}\sum_{i \in S}y_i\) is the sample mean and the label \(bc\) indicates that the bias correction is applied to the estimation of the Gini index.
method = 2
$$\widehat{G}_{2} = \displaystyle \frac{n-1}{n}\frac{\sum_{i=1}^{n-1}(p_i-q_i)}{\sum_{i=1}^{n-1}pi};$$
$$\widehat{G}_{2}^{bc} = \displaystyle \frac{\sum_{i=1}^{n-1}(p_i-q_i)}{\sum_{i=1}^{n-1}pi},$$
where
$$ p_i= \displaystyle \frac{i}{n}; \quad q_i= \frac{y_{i}^{+}}{y_{n}^{+}}, $$
and \(y_{i}^{+}=\sum_{j=1}^{i}y_{(j)}\), with \(i=\{1,\ldots,n\}\), are the cumulative sums
of the ordered values \(y_{(i)}\) (in non-decreasing order) of the variable of interest \(y\).
method = 3
$$\widehat{G}_{3} = \displaystyle \frac{n-1}{n} - \frac{2}{n}\sum_{i=1}^{n-1}q_i; $$
$$\widehat{G}_{3}^{bc} = 1 - \displaystyle \frac{2}{n-1}\sum_{i=1}^{n-1}q_i. $$
method = 4
$$\widehat{G}_{4} = 1 - \displaystyle \sum_{i=0}^{n-1}(q_{i+1} + q_i)(p_{i+1} - p_i); $$
$$\widehat{G}_{4}^{bc} = \displaystyle \frac{n}{n-1}\left[1 - \sum_{i=0}^{n-1}(q_{i+1} + q_i)(p_{i+1} - p_i)\right], $$
where \(p_0=q_0=0.\)
method = 5
$$\widehat{G}_{5} = \displaystyle \frac{2}{\overline{y}n^{2}}\sum_{i \in S}iy_{(i)} - \frac{n+1}{n}; $$
$$\widehat{G}_{5}^{bc} = \displaystyle \frac{2}{\overline{y}n(n-1)}\sum_{i \in S}iy_{(i)} - \frac{n+1}{n-1}. $$
method = 6
$$\widehat{G}_{6} = \displaystyle \frac{2}{\overline{y}n}cov(i,y_{(i)}); $$
$$\widehat{G}_{6}^{bc} = \displaystyle \frac{2}{\overline{y}(n-1)}cov(i,y_{(i)}). $$
method = 7
$$\widehat{G}_{7} = \displaystyle \frac{1}{\overline{y}n^2}\sum_{i \in S}\sum_{j\in S}|y_i-y_j|\cdot |\widehat{F}_{n}^{\ast}(y_{i})-\widehat{F}_{n}^{\ast}(y_{j})|; $$
$$\widehat{G}_{7}^{bc} = \displaystyle \frac{1}{\overline{y}n(n-1)}\sum_{i\in S}\sum_{j \in S}|y_i-y_j|\cdot |\widehat{F}_{n}^{\ast}(y_{i})-\widehat{F}_{n}^{\ast}(y_{j})|, $$
where
$$\widehat{F}_{n}^{\ast}(t)= \displaystyle \frac{1}{n}\sum_{i \in S}[\delta(y_i < t) + 0.5\delta(y_i = t)]$$
is the smooth (mid-point) distribution function.
method = 8
$$\widehat{G}_{8} = 1 - \displaystyle \frac{1}{\overline{y}n^2}\sum_{i \in S}\sum_{j \in S}min(y_i,y_j); $$
$$\widehat{G}_{8}^{bc} = 1 - \displaystyle \frac{1}{\overline{y}n(n-1)}\sum_{i \in S}\sum_{\substack{j \in S\\ j\neq i} }min(y_i,y_j).$$
method = 9
$$\widehat{G}_{9} = \displaystyle \frac{2}{\overline{y}n}\sum_{i \in S}y_{i}\widehat{F}_{n}^{\ast}(y_{i}) - 1;$$
$$\widehat{G}_{9}^{bc} = \displaystyle \frac{2}{\overline{y}(n-1)}\sum_{i \in S}y_{i}\widehat{F}_{n}^{\ast}(y_{i}) - \frac{n}{n-1}.$$
method = 10
$$\widehat{G}_{10} = \displaystyle \frac{n-1}{2\overline{y}n}\binom{n}{2}^{-1}\sum_{i \leq i_{1} < i_{2} \leq n}|y_{i_{1}}-y_{i_{2}}|;$$
$$\widehat{G}_{10}^{bc} = \displaystyle \frac{1}{2\overline{y}}\binom{n}{2}^{-1}\sum_{i \leq i_{1} < i_{2} \leq n}|y_{i_{1}}-y_{i_{2}}|.$$