jip_approx: Approximate Joint-Inclusion Probabilities

Description

Approximations of joint-inclusion probabilities by means of first-order inclusion probabilities.

Usage

jip_approx(pik, method)

Value

A symmetric matrix of inclusion probabilities, which diagonal is the vector of first-order inclusion probabilities.

Arguments

pik: numeric vector of first-order inclusion probabilities for all population units.
method: string representing one of the available approximation methods.

Details

Available methods are "Hajek", "HartleyRao", "Tille", "Brewer1","Brewer2","Brewer3", and "Brewer4". Note that these methods were derived for high-entropy sampling designs, therefore they could have low performance under different designs.

Hájek (1964) approximation [method="Hajek"] is derived under Maximum Entropy sampling design and is given by

$$\tilde{\pi}_{ij} = \pi_i\pi_j \frac{1 - (1-\pi_i)(1-\pi_j)}{d} $$ where $d = \sum_{i\in U} \pi_i(1-\pi_i) $

Hartley and Rao (1962) proposed the following approximation under randomised systematic sampling [method="HartleyRao"]:

$$\tilde{\pi}_{ij} = \frac{n-1}{n} \pi_i\pi_j + \frac{n-1}{n^2} (\pi_i^2 \pi_j + \pi_i \pi_j^2) - \frac{n-1}{n^3}\pi_i\pi_j \sum_{i\in U} \pi_j^2$$

$$ + \frac{2(n-1)}{n^3} (\pi_i^3 \pi_j + \pi_i\pi_j^3 + \pi_i^2 \pi_j^2) - \frac{3(n-1)}{n^4} (\pi_i^2 \pi_j + \pi_i\pi_j^2) \sum_{i \in U}\pi_i^2$$

$$+ \frac{3(n-1)}{n^5} \pi_i\pi_j \biggl( \sum_{i\in U} \pi_i^2 \biggr)^2 - \frac{2(n-1)}{n^4} \pi_i\pi_j \sum_{i \in U} \pi_j^3 $$

Tillé (1996) proposed the approximation $\tilde{\pi}_{ij} = \beta_i\beta_j$, where the coefficients $\beta_i$ are computed iteratively through the following procedure [method="Tille"]:

$\beta_i^{(0)} = \pi_i, \,\, \forall i\in U$
$ \beta_i^{(2k-1)} = \frac{(n-1)\pi_i}{\beta^{(2k-2)} - \beta_i^{(2k-2)}} $
$\beta_i^{2k} = \beta_i^{(2k-1)} \Biggl( \frac{n(n-1)}{(\beta^(2k-1))^2 - \sum_{i\in U} (\beta_k^{(2k-1)})^2 } \Biggr)^(1/2) $

with $\beta^{(k)} = \sum_{i\in U} \beta_i^{i}, \,\, k=1,2,3, \dots $

Finally, Brewer (2002) and Brewer and Donadio (2003) proposed four approximations, which are defined by the general form

$$\tilde{\pi}_{ij} = \pi_i\pi_j (c_i + c_j)/2 $$

where the $c_i$ determine the approximation used:

Equation (9) [method="Brewer1"]: $$c_i = (n-1) / (n-\pi_i)$$
Equation (10) [method="Brewer2"]: $$c_i = (n-1) / \Bigl(n- n^{-1}\sum_{i\in U}\pi_i^2 \Bigr)$$
Equation (11) [method="Brewer3"]: $$c_i = (n-1) / \Bigl(n - 2\pi_i + n^{-1}\sum_{i\in U}\pi_i^2 \Bigr)$$
Equation (18) [method="Brewer4"]: $$c_i = (n-1) / \Bigl(n - (2n-1)(n-1)^{-1}\pi_i + (n-1)^{-1}\sum_{i\in U}\pi_i^2 \Bigr)$$

References

Hartley, H.O.; Rao, J.N.K., 1962. Sampling With Unequal Probability and Without Replacement. The Annals of Mathematical Statistics 33 (2), 350-374.

Hájek, J., 1964. Asymptotic Theory of Rejective Sampling with Varying Probabilities from a Finite Population. The Annals of Mathematical Statistics 35 (4), 1491-1523.

Tillé, Y., 1996. Some Remarks on Unequal Probability Sampling Designs Without Replacement. Annals of Economics and Statistics 44, 177-189.

Brewer, K.R.W.; Donadio, M.E., 2003. The High Entropy Variance of the Horvitz-Thompson Estimator. Survey Methodology 29 (2), 189-196.

Examples

Run this code


### Generate population data ---
N <- 20; n<-5

set.seed(0)
x <- rgamma(N, scale=10, shape=5)
y <- abs( 2*x + 3.7*sqrt(x) * rnorm(N) )

pik  <- n * x/sum(x)

### Approximate joint-inclusion probabilities ---
pikl <- jip_approx(pik, method='Hajek')
pikl <- jip_approx(pik, method='HartleyRao')
pikl <- jip_approx(pik, method='Tille')
pikl <- jip_approx(pik, method='Brewer1')
pikl <- jip_approx(pik, method='Brewer2')
pikl <- jip_approx(pik, method='Brewer3')
pikl <- jip_approx(pik, method='Brewer4')

Run the code above in your browser using DataLab