variance_est: Variance estimation for sample surveys by the ultimate cluster method

Description

Computes the variance estimation by the ultimate cluster method.

Usage

variance_est(Y, H, PSU, w_final, N_h=NULL, fh_zero=FALSE,
                    PSU_level=TRUE, period=NULL, dataset=NULL)

Arguments

Variables of interest. Object convertible to data.table or variable names as character, column numbers.

The unit stratum variable. One dimensional object convertible to one-column data.table or variable name as character, column number.

PSU

Primary sampling unit variable. One dimensional object convertible to one-column data.table or variable name as character, column number.

w_final

Weight variable. One dimensional object convertible to one-column data.table or variable name as character, column number.

N_h

optional; either a matrix giving the first column - stratum, but the second column - the total of the population in each stratum.

fh_zero

by default FALSE; fh is calculated as division of n_h and N_h in each strata, if true, fh value is zero in each strata.

PSU_level

by default TRUE; if PSU_level is true, in each strata fh is calculated as division of count of PSU in sample (n_h) and count of PSU in frame (N_h). if PSU_level is false, in each strata fh is calculated as division of count of units in sample (n_h) and co

period

Optional variable for the survey periods. If supplied, the values for each period are computed independently. Object convertible to data.table or variable names as character, column numbers.

dataset

an optional name of the individual dataset data.table.

Value

a data.table containing the values of the variance estimation by totals.

Details

If we assume that $n_h \geq 2$ for all $h$, that is, two or more PSUs are selected from each stratum, then the variance of $\hat{\theta}$ can be estimated from the variation among the estimated PSU totals of the variable $Z$: $$\hat{V} \left(\hat{\theta} \right)=\sum\limits_{h=1}^{H} \left(1-f_h \right) \frac{n_h}{n_{h}-1} \sum\limits_{i=1}^{n_h} \left( z_{hi\bullet}-\bar{z}_{h\bullet\bullet}\right)^2,$$ where $\bullet$ $z_{hi\bullet}=\sum\limits_{j=1}^{m_{hi}} \omega_{hij} z_{hij}$ $\bullet$ $\bar{z}_{h\bullet\bullet}=\frac{\left( \sum\limits_{i=1}^{n_h} z_{hi\bullet} \right)}{n_h}$ $\bullet$ $f_h$ is the sampling fraction of PSUs within stratum $\bullet$ $h$ is the stratum number, with a total of H strata $\bullet$ $i$ is the primary sampling unit (PSU) number within stratum $h$, with a total of $n_h$ PSUs $\bullet$ $j$ is the household number within cluster $i$ of stratum $h$, with a total of $m_{hi}$ household $\bullet$ $w_{hij}$ is the sampling weight for household $j$ in PSU $i$ of stratum $h$ $\bullet$ $z_{hij}$ denotes the observed value of the analysis variable $z$ for household $j$ in PSU $i$ of stratum $h$

References

Morris H. Hansen, William N. Hurwitz, William G. Madow, (1953), Sample survey methods and theory Volume I Methods and applications, 257-258, Wiley. Guillaume Osier and Emilio Di Meglio. The linearisation approach implemented by Eurostat for the first wave of EU-SILC: what could be done from the second onwards? 2012 Eurostat Methodologies and Working papers, Standard error estimation for the EU-SILC indicators of poverty and social exclusion, 2013, URL http://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF. Yves G. Berger, Tim Goedeme, Guillame Osier (2013). Handbook on standard error estimation and other related sampling issues in EU-SILC, URL https://ec.europa.eu/eurostat/cros/content/handbook-standard-error-estimation-and-other-related-sampling-issues-ver-29072013_en Eurostat Methodologies and Working papers, Handbook on precision requirements and variance estimation for ESS household surveys, 2013, URL http://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF.

Examples

Run this code

Ys <- rchisq(10, 3)
w <- rep(2, 10)
PSU <- 1:length(Ys)
H <- rep("Strata_1", 10)

# by default without using fh_zero (finite population correction)
variance_est(Y=Ys, H=H, PSU=PSU, w_final=w)


# without using fh_zero (finite population correction)
variance_est(Y=Ys, H=H, PSU=PSU, w_final=w, fh_zero=FALSE)

# with using fh_zero (finite population correction)
variance_est(Y=Ys, H=H, PSU=PSU, w_final=w, fh_zero=TRUE)

Run the code above in your browser using DataLab