calib_el: An internal function for calibration of weights using empirical likelihood method

Description

calib_el performs calibration using empirical likelihood (EL) method. The function is taken from Wu (2005), if algorithm has problem with convergence codes from Zhang, Han and Wu (2022) using constrOptim is used.

In (pseudo) EL the following (pseudo) EL function is maximized

_i r d_i(p_i),

under the following constraint

_i r p_i = 1,

with constraints on quantiles (with notation as in Harms and Duchesne (2006))

_i r p_i(a_i - /N) = 0,

where a_i is created using joint_calib_create_matrix function, and possibly means

_i r p_i(x_i - _x) = 0,

where _x is known population mean of X. For simplicity of notation we assume only one quantile and one mean is known. This can be generalized to multiple quantiles and means.

Usage

calib_el(X, d, totals, maxit = 50, tol = 1e-08, eps = .Machine$double.eps, ...)

Value

Returns a vector of empirical likelihood g-weights

Arguments

X: matrix of variables for calibration of quantiles and totals (first column should be intercept),
d: initial d-weights for calibration (e.g. design-weights),
totals: vector of totals (where 1 element is the population size),
maxit: a numeric value giving the maximum number of iterations,
tol: the desired accuracy for the iterative procedure,
eps: the desired accuracy for computing the Moore-Penrose generalized inverse (see MASS::ginv()),
...: arguments passed to stats::optim via stats::constrOptim.

Author

Maciej Beręsewicz based on Wu (2005) and Zhang, Han and Wu (2022)

References

Wu, C. (2005). Algorithms and R codes for the pseudo empirical likelihood method in survey sampling. Survey Methodology, 31(2), 239 (code is taken from https://sas.uwaterloo.ca/~cbwu/Rcodes/LagrangeM2.txt).

Zhang, S., Han, P., and Wu, C. (2023) Calibration Techniques Encompassing Survey Sampling, Missing Data Analysis and Causal Inference. International Statistical Review, 91: 165–192. https://doi.org/10.1111/insr.12518 (code is taken from Supplementary Materials).

Examples

Run this code

## generate data based on Haziza and Lesage (2016)
set.seed(123)
N <- 1000
x <- runif(N, 0, 80)
y <- exp(-0.1 + 0.1*x) + rnorm(N, 0, 300)
p <- rbinom(N, 1, prob = exp(-0.2 - 0.014*x))
totals_known <- c(N=N, x=sum(x))
df <- data.frame(x, y, p)
df_resp <- df[df$p == 1, ]
df_resp$d <- N/nrow(df_resp)
res <- calib_el(X = model.matrix(~x, df_resp),
                d = df_resp$d,
                totals = totals_known)
data.frame(known = totals_known, estimated=colSums(res*df_resp$d*model.matrix(~x, df_resp)))

Run the code above in your browser using DataLab