Learn R Programming

RCAL (version 2.0)

mn.regu.cv: Model-assisted inference for population means based on cross validation

Description

This function implements model-assisted inference for population means with missing data, using regularized calibrated estimation based on cross validation.

Usage

mn.regu.cv(fold, nrho = NULL, rho.seq = NULL, y, tr, x, ploss = "cal",
  yloss = "gaus", off = 0, ...)

Arguments

fold

A vector of length 2 giving the fold numbers for cross validation in propensity score estimation and outcome regression respectively.

nrho

A vector of length 2 giving the numbers of tuning parameters searched in cross validation.

rho.seq

A list of two vectors giving the tuning parameters in propensity score estimation (first vector) and outcome regression (second vector).

y

An \(n\) x \(1\) vector of outcomes with missing data.

tr

An \(n\) x \(1\) vector of non-missing indicators (=1 if y is observed or 0 if y is missing).

x

An \(n\) x \(p\) matix of covariates, used in both propensity score and outcome regression models.

ploss

A loss function used in propensity score estimation (either "ml" or "cal").

yloss

A loss function used in outcome regression (either "gaus" for continuous outcomes or "ml" for binary outcomes).

off

An offset value (e.g., the true value in simulations) used to calculate the z-statistic from augmented IPW estimation.

...

Additional arguments to glm.regu.cv.

Value

ps

A list containing the results from fitting the propensity score model by glm.regu.cv.

fp

The \(n\) x \(1\) vector of fitted propensity scores.

or

A list containing the results from fitting the outcome regression model by glm.regu.cv.

fo

The \(n\) x \(1\) vector of fitted values from outcome regression.

est

A list containing the results from augmented IPW estimation by mn.aipw.

Details

Two steps are involved in this function: first fitting propensity score and outcome regression models and then applying the augmented IPW estimator for a population mean. For ploss="cal", regularized calibrated estimation is performed with cross validation as described in Tan (2020a, 2020b). The method then leads to model-assisted inference, in which confidence intervals are valid with high-dimensinoal data if the propensity score model is correctly specified but the outcome regression model may be misspecified. With linear outcome models, the inference is also doubly robust. For ploss="ml", regularized maximum likelihood estimation is used (Belloni et al. 2014; Farrell 2015). In this case, standard errors are only shown to be valid if both the propensity score model and the outcome regression model are correctly specified.

References

Belloni, A., Chernozhukov, V., and Hansen, C. (2014) Inference on treatment effects after selection among high-dimensional controls, Review of Economic Studies, 81, 608-650.

Farrell, M.H. (2015) Robust inference on average treatment effects with possibly more covariates than observations, Journal of Econometrics, 189, 1-23.

Tan, Z. (2020a) Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data, Biometrika, 107, 137<U+2013>158.

Tan, Z. (2020b) Model-assisted inference for treatment effects using regularized calibrated estimation with high-dimensional data, Annals of Statistics, 48, 811<U+2013>837.

Examples

Run this code
# NOT RUN {
data(simu.data)
n <- dim(simu.data)[1]
p <- dim(simu.data)[2]-2

y <- simu.data[,1]
tr <- simu.data[,2]
x <- simu.data[,2+1:p]
x <- scale(x)

# missing data
y[tr==0] <- NA

mn.cv.rcal <- mn.regu.cv(fold=5*c(1,1), nrho=(1+10)*c(1,1), rho.seq=NULL, y, tr, x, 
                         ploss="cal", yloss="gaus")
unlist(mn.cv.rcal$est)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab