Learn R Programming

vardpoor (version 0.2.0.8.1)

variance_othstr: Variance estimation for sample surveys by the new stratification

Description

Computes the variance estimation by the new stratification.

Usage

variance_othstr(Y, H, H2, w_final, N_h=NULL, N_h2, s2g=FALSE, period=NULL, dataset=NULL)

Arguments

Y
either a numeric data.frame, matrix, data.table with column names giving the variables of interest, or (if dataset is not NULL) a character string, an integer or a logical vector (length is the same as 'dataset' column
H
either 1 column data.frame, matrix, data.table with column name giving elements indicating the unit stratum, or (if dataset is not NULL) a character string, an integer or a logical vector (length is the same as 'dataset
H2
either 1 column data.frame, matrix, data.table with column name giving elements indicating the unit new stratum, or (if dataset is not NULL) a character string, an integer or a logical vector (length is the same as 'dat
w_final
either a numeric vector, 1 column data.frame, matrix, data.table giving the final weights, or (if dataset is not NULL) a character string, an integer or a logical vector (length is the same as 'dataset' column count) sp
N_h
either a matrix giving the first column - stratum, but the second column - the total of the population in each stratum.
N_h2
either a matrix giving the first column - new stratum, but the second column - the total of the population in each stratum.
s2g
by default is FALSE; calculate variance, but if s2g is TRUE, then variance estimation is taken as S^2g value.
period
optional; either a data.frame, matrix, data.table with column names giving different periods, or (if dataset is not NULL) character strings, integers or a logical vectors (length is the same as 'dataset' column coun
dataset
an optional name of the individual dataset data.frame.

Value

  • a data.table containing the values of the variance estimation by totals.

Details

It is possible to compute population size $M_g$ from sampling frame. The standard deviation of $g$-th stratum is $$S_g^2 =\frac{1}{M_g-1} \sum\limits_{k=1}^{M_g} \left(y_{gk}-\bar{Y}_g \right)^2= \frac{1}{M_g-1} \sum\limits_{k=1}^{M_g} y_{gk}^2 - \frac{M_g}{M_g-1}\bar{Y}_g^2$$ $\sum\limits_{k=1}^{M_g} y_{gk} ^2$ and $\bar{Y}_g^2$ have to be estimeted to estimate $S_g^2$. Estimate of $\sum\limits_{k=1}^{M_g} y_{gk}^2$ is $\sum\limits_{h=1}^{H} \frac{N_h}{n_h} \sum\limits_{i=1}^{n_h} y_{gi}^2 z_{hi}$, where $z_{hi} = \left{ \begin{array}{ll} 0, & h_i \in \theta_g \ 1, & h_i \notin \theta_g \end{array} \right.$, $\theta_g$ is the index group of successfully surveyed units belonging to $g$-th stratum. Estimate of $\bar{Y}_g^2$ is $$\hat{\bar{Y}}_g^2=\left( \hat{\bar{Y}}_g \right)^2-\hat{Var} \left(\hat{\bar{Y}} \right)$$ $$\hat{\bar{Y}}_g =\frac{\hat{Y}_g}{M_g}= \frac{1}{M_g} \sum\limits_{h=1}^{H} \frac{N_h}{n_h} \sum\limits_{i=1}^{n_h} y_{hi} z_{hi}$$ $$\hat{Var} \left(\hat{\bar{Y}} \right) =\frac{1}{M_g^2} \sum\limits_{h=1}^{H} N_h^2 \left(\frac{1}{n_h} - \frac{1}{N_h}\right) \sigma_h^2$$ $$\sigma_h^2 =\frac{1}{n_h-1} \sum\limits_{i=1}^{n_h} \left(y_{hi} z_{hi} - \frac{1}{n_h} \sum\limits_{t=1}^{n_h} y_{ht} z_{ht} \right)^2$$ So the estimate of $S_g^2$ is $s_g^2=\frac{1}{M_g-1} \sum\limits_{h=1}^{H} \frac{N_h}{n_h} \sum\limits_{i=1}^{n_h} y_{hi}^2 z_{hi} -$ $-\frac{M_g}{M_g-1} \left( \left( \frac{1}{M_g} \sum\limits_{h=1}^{H} \frac{N_h}{n_h} \sum\limits_{i=1}^{n_h} y_{hi} z_{hi} \right)^2 - \frac{1}{M_g^2} \sum\limits_{h=1}^{H} N_h^2 \left(\frac{1}{n_h} - \frac{1}{N_h}\right) \frac{1}{n_h-1} \sum\limits_{i=1}^{n_h} \left(y_{hi} z_{hi} - \frac{1}{n_h} \sum\limits_{t=1}^{n_h} y_{ht} z_{ht} \right)^2 \right)$ Two conditions have to realize to estimate $S_g^2: n_h>1, \forall g$ and $\theta_g \ne , \forall g.$ Variance of $\hat{Y}$ is $$Var\left( \hat{Y} \right) = \sum\limits_{g=1}^{G} M_g^2 \left( \frac{1}{m_g} - \frac{1}{M_g} \right) S_g^2$$ Estimate of $\hat{Var}\left( \hat{Y} \right)$ is $$\hat{Var}\left( \hat{Y} \right) = \sum\limits_{g=1}^{G} M_g^2 \left( \frac{1}{m_g} - \frac{1}{M_g} \right)s_g^2$$

References

M. Liberts. (2004) Non-response Analysis and Bias Estimation in a Survey on Transportation of Goods by Road.

See Also

domain, lin.ratio, linarpr, linarpt, lingini, lingini2, lingpg, linpoormed, linqsr, linrmpg, residual_est, vardom, vardom_othstr, vardomh, varpoord

Examples

Run this code
period=NULL
dataset=NULL
Y <- data.table(matrix(runif(50)*5, ncol=5))

H <- data.table(H=trunc(5*runif(10)))
H2 <- data.table(H2=trunc(3*runif(10)))

N_h <- data.table(matrix(0:4,5,1))
setnames(N_h, names(N_h), "H")
N_h[, sk:=10]

N_h2 <- data.table(matrix(0:2,3,1))
setnames(N_h2, names(N_h2), "H2")
N_h2[, sk2:=4]

w_final <- rep(2, 10)
PSU <- 1:10
variance_othstr(Y, H, H2, PSU, w_final, N_h=N_h, N_h2=N_h2, period=NULL, dataset=NULL)

Run the code above in your browser using DataLab