RRTCS-package: Randomized Response Techniques for Complex Surveys

Description

The aim of this package is to calculate point and interval estimation for linear parameters with data obtained from randomized response surveys. Twenty one RR methods are implemented for complex surveys:

- Randomized response procedures to estimate parameters of a qualitative stigmatizing characteristic: Christofides model, Devore model, Forced-Response model, Horvitz model, Horvitz model with unknown B, Kuk model, Mangat model, Mangat model with unknown B, Mangat-Singh model, Mangat-Singh-Singh model, Mangat-Singh-Singh model with unknown B, Singh-Joarder model, SoberanisCruz model and Warner model.

- Randomized response procedures to estimate parameters of a quantitative stigmatizing characteristic: BarLev model, Chaudhuri-Christofides model, Diana-Perri-1 model, Diana-Perri-2 model, Eichhorn-Hayre model, Eriksson model and Saha model.

Using the usual notation in survey sampling, we consider a finite population $U=\{1,\ldots,i,\ldots,N\}$, consisting of $N$ different elements. Let $y_i$ be the value of the sensitive aspect under study for the $i$th population element. Our aim is to estimate the finite population total $Y=\sum_{i=1}^N y_i$ of the variable of interest $y$ or the population mean $\bar{Y}=\frac{1}{N}\sum_{i=1}^N y_i$. If we can estimate the proportion of the population presenting a certain stigmatized behaviour $A$, the variable $y_i$ takes the value 1 if $i\in G_A$ (the group with the stigmatized behaviour) and the value zero otherwise. Some qualitative models use an innocuous or related attribute $B$ whose population proportion can be known or unknown.

Assume that a sample $s$ is chosen according to a general design $p$ with inclusion probabilities $\pi_i=\sum_{s\ni i}p(s),i\in U$.

In order to include a wide variety of RR procedures, we consider the unified approach given by Arnab (1994). The interviews of individuals in the sample $s$ are conducted in accordance with the RR model. For each $i\in s$ the RR induces a random response $z_i$ (denoted scrambled response) so that the revised randomized response $r_i$ (Chaudhuri and Christofides, 2013) is an unbiased estimation of $y_i$. Then, an unbiased estimator for the population total of the sensitive characteristic $y$ is given by $$\widehat{Y}_R=\sum_{i\in s}\frac{r_i}{\pi_i}$$ The variance of this estimator is given by: $$V(\widehat{Y}_R)=\sum_{i\in U}\frac{V_R(r_i)}{\pi_i}+V_{HT}(r)$$ where $V_R(r_i)$ is the variance of $r_i$ under the randomized device and $V_{HT}(r)$ is the design-variance of the Horvitz Thompson estimator of $r_i$ values.

This variance is estimated by: $$\widehat{V}(\widehat{Y}_R)=\sum_{i\in s}\frac{\widehat{V}_R(r_i)}{\pi_i}+\widehat{V}(r)$$ where $\widehat{V}_R(r_i)$ varies with the RR device and the estimation of the design-variance, $\widehat{V}(r)$, is obtained using Deville's method (Deville, 1993).

The confidence interval at $(1-\alpha)$ % level is given by $$ci=\left(\widehat{Y}_R-z_{1-\frac{\alpha}{2}}\sqrt{\widehat{V}(\widehat{Y}_R)},\widehat{Y}_R+z_{1-\frac{\alpha}{2}}\sqrt{\widehat{V}(\widehat{Y}_R)}\right)$$ where $z_{1-\frac{\alpha}{2}}$ denotes the $(1-\alpha)$ % quantile of a standard normal distribution.

Similarly, an unbiased estimator for the population mean $\bar{Y}$ is given by $$\widehat{\bar{Y}}_R= \frac{1}{N}\sum_{i\in s}\frac{r_i}{\pi_i}$$ and an unbiased estimator for its variance is calculated as: $$\widehat{V}(\widehat{\bar{Y}}_R)=\frac{1}{N^2}\left(\sum_{i\in s}\frac{\widehat{V}_R(r_i)}{\pi_i}+\widehat{V}(r)\right)$$ In cases where the population size $N$ is unknown, we consider H<U+00E0>jek-type estimators for the mean: $$\widehat{\bar{Y}}_{RH}=\frac{\sum_{i\in s}r_i}{\sum_{i\in s}\frac{1}{\pi_i}}$$ and Taylor-series linearization variance estimation of the ratio (Wolter, 2007) is used.

In qualitative models, the values $r_i$ and $\widehat{V}_R(r_i)$ for $i\in s$ are described in each model.

In some quantitative models, the values $r_i$ and $\widehat{V}_R(r_i)$ for $i\in s$ are calculated in a general form (Arcos et al, 2015) as follows:

The randomized response given by the person $i$ is $$z_i=\left\{\begin{array}{lccc} y_i & \textrm{with probability } p_1\\ y_iS_1+S_2 & \textrm{with probability } p_2\\ S_3 & \textrm{with probability } p_3 \end{array} \right.$$ with $p_1+p_2+p_3=1$ and where $S_1,S_2$ and $S_3$ are scramble variables whose distributions are assumed to be known. We denote by $\mu_i$ and $\sigma_i$ respectively the mean and standard deviation of the variable $S_i,(i=1,2,3)$.

The transformed variable is $$r_i=\frac{z_i-p_2\mu_2-p_3\mu_3}{p_1+p_2\mu_1},$$ its variance is $$V_R(r_i)=\frac{1}{(p_1+p_2\mu_1)^2}(y_i^2A+y_iB+C)$$ where $$A=p_1(1-p_1)+\sigma_1^2p_2+\mu_1^2p_2-\mu_1^2p_2^2-2p_1p_2\mu_1$$ $$B=2p_2\mu_1\mu_2-2\mu_1\mu_2p_2^2-2p_1p_2\mu_2-2\mu_3p_1p_3-2\mu_1\mu_3p_2p_3$$ $$C=(\sigma_2^2+\mu_2^2)p_2+(\sigma_3^2+\mu_3^2)p_3-(\mu_2p_2+\mu_3p_3)^2$$ and the estimated variance is $$\widehat{V}_R(r_i)=\frac{1}{(p_1+p_2\mu_1)^2}(r_i^2A+r_iB+C).$$ Some of the quantitative techniques considered can be viewed as particular cases of the above described procedure. Other models are described in the respective function.

Alternatively, the variance can be estimated using certain resampling methods.

Arguments

References

Arcos, A., Rueda, M., Singh, S. (2015). A generalized approach to randomised response for quantitative variables. Quality and Quantity 49, 1239-1256.

Arnab, R. (1994). Non-negative variance estimator in randomized response surveys. Comm. Stat. Theo. Math. 23, 1743-1752.

Chaudhuri, A., Christofides, T.C. (2013). Indirect Questioning in Sample Surveys Springer-Verlag Berlin Heidelberg.

Deville, J.C. (1993). Estimation de la variance pour les enqu<U+00EA>tes en deux phases. Manuscript, INSEE, Paris.

Wolter, K.M. (2007). Introduction to Variance Estimation. 2nd Edition. Springer.