Var_approx: Approximate the Variance of the Horvitz-Thompson estimator

Description

Approximations of the Horvitz-Thompson variance for High-Entropy sampling designs. Such methods use only first-order inclusion probabilities.

Usage

Var_approx(y, pik, n, method, ...)

Value

a scalar, the approximated variance.

Arguments

y: numeric vector containing the values of the variable of interest for all population units
pik: numeric vector of first-order inclusion probabilities, of length equal to population size
n: a scalar indicating the sample size
method: string indicating the approximation that should be used. One of "Hajek1", "Hajek2", "HartleyRao1", "HartleyRao2", "FixedPoint".
...: two optional parameters can be modified to control the iterative procedure in method="FixedPoint": maxIter sets the maximum number of iterations and eps controls the convergence error

Details

The variance approximations available in this function are described below, the notation used is that of Matei and Tillé (2005).

Hájek variance approximation (method="Hajek1"):

$$ \tilde{Var} = \sum_{i \in U} \frac{b_i}{\pi_i^2}(y_i - y_i^*)^2 $$ where $$y_i^* = \pi_i \frac{ \sum_{j\in U} b_j y_j/\pi_j }{ \sum_{j \in U} b_j } $$ and $$ b_i = \frac{ \pi_i(1-\pi_i)N }{ N-1 } $$
Starting from Hajék (1964), Brewer (2002) defined the following estimator (method="Hajek2"):

$$\tilde{Var} = \sum_{i \in U} \pi_i(1-\pi_i) \Bigl( \frac{y_i}{\pi_i} - \frac{\tilde{Y}}{n} \Bigr)^2 $$ where $\tilde{Y} = \sum_{i \in U} a_i y_i$ and $a_i = n(1-\pi_i)/\sum_{j \in U} \pi_j(1-\pi_j) $
Hartley and Rao (1962) variance approximation (method="HartleyRao1"):

$$ \tilde{Var} = \sum_{i \in U} \pi_i \Bigl( 1 - \frac{n-1}{n}\pi_i \Bigr) \Biggr( \frac{y_i}{\pi_i} - \frac{Y}{n} \Biggr)^2$$

$$\qquad - \frac{n-1}{n^2} \sum_{i \in U} \Biggl( 2\pi_i^3 - \frac{\pi_i^2}{2} \sum_{j \in U} \pi_j^2 \Biggr) \Biggr( \frac{y_i}{\pi_i} - \frac{Y}{n} \Biggr)^2$$

$$\quad \qquad + \frac{2(n-1)}{n^3} \Biggl( \sum_{i \in U}\pi_i y_i - \frac{Y}{n}\sum_{i\in U} \pi_i^2 \Biggr)^2 $$
Hartley and Rao (1962) provide a simplified version of the variance above (method="HartleyRao2"):

$$ \tilde{Var} = \sum_{i \in U} \pi_i \Bigl( 1 - \frac{n-1}{n}\pi_i \Bigr) \Biggr( \frac{y_i}{\pi_i} - \frac{Y}{n} \Biggr)^2 $$
method="FixedPoint" computes the Fixed-Point variance approximation proposed by Deville and Tillé (2005). The variance can be expressed in the same form as in method="Hajek1", and the coefficients $b_i$ are computed iteratively by the algorithm:
1. $$b_i^{(0)} = \pi_i (1-\pi_i) \frac{N}{N-1}, \,\, \forall i \in U $$
2. $$ b_i^{(k)} = \frac{(b_i^{(k-1)})^2 }{\sum_{j\in U} b_j^{(k-1)} } + \pi_i(1-\pi_i) $$
a necessary condition for convergence is checked and, if not satisfied, the function returns an alternative solution that uses only one iteration:

$$b_i = \pi_i(1-\pi_i)\Biggl( \frac{N\pi_i(1-\pi_i)}{ (N-1)\sum_{j\in U}\pi_j(1-\pi_j) } + 1 \Biggr) $$

References

Matei, A.; Tillé, Y., 2005. Evaluation of variance approximations and estimators in maximum entropy sampling with unequal probability and fixed sample size. Journal of Official Statistics 21 (4), 543-570.

Examples

Run this code


N <- 500; n <- 50

set.seed(0)
x <- rgamma(n=N, scale=10, shape=5)
y <- abs( 2*x + 3.7*sqrt(x) * rnorm(N) )

pik  <- n * x/sum(x)
pikl <- outer(pik, pik, '*'); diag(pikl) <- pik

### Variance approximations ---
Var_approx(y, pik, n, method = "Hajek1")
Var_approx(y, pik, n, method = "Hajek2")
Var_approx(y, pik, n, method = "HartleyRao1")
Var_approx(y, pik, n, method = "HartleyRao2")
Var_approx(y, pik, n, method = "FixedPoint")

Run the code above in your browser using DataLab