std_comp: Generate standardized complete cases

Description

Standardizes covariates for propensity score estimation before the distribution balancing weighting dbw with the regularization.

Usage

std_comp(
  formula_y,
  formula_ps,
  estimand = "ATE",
  method_y = "wls",
  data,
  std = TRUE,
  weights = NULL
)

Value

data: the complete-case data frame containing the outcome variable, the response (treatment) variable, and the standardized covariates for propensity score estimation.
weights: the initially-supplied weights for the complete cases, a vector of 1s if none were.
formula_ps: the automatically-created formula for propensity score estimation, which includes expanded factors and interactions.
mean_x: the weighted mean of each covariates.
sd_x: the weighted standard deviation of each covariates.

Arguments

formula_y: an object of class formula (or one that can be coerced to that class): a symbolic description of the potential outcome model to be fitted. When you want to use non-DR type estimators, only include "1" in the right hand side of the formula for the Hajek estimator and only include "0" for the Horvitz-Thompson estimator. See example of dbw for more details.
formula_ps: an object of class formula (or one that can be coerced to that class): a symbolic description of the propensity score model to be fitted. For the entropy balancing weighting (method = "eb"), variables in the right hand side of the formula will be mean balanced.
estimand: a character string specifying a parameter of interest. Choose "ATT" for the average treatment effects on the treated estimation, "ATE" for the average treatment effects estimation, "ATC" for the average outcomes estimation in missing outcome cases. You can choose "ATEcombined" for the combined estimation for the average treatment effects estimation when using the covariate balancing weighting (method = "cb").
method_y: a character string specifying a method for potential outcome prediction. Choose "wls" for the linear model, "logit" for the logistic regression, "gam" for the generalized additive model for the continuous outcome, and "gambinom" for the generalized additive model for the binary outcome.
data: a data frame (or one that can be coerced to that class) containing the outcomes and the variables in the model.
std: a logical parameter indicating whether to standardize the covariates for propensity score estimation.
weights: an optional vector of ‘prior weights’ (e.g. sampling weights) to be used in the fitting process. Should be NULL or a numeric vector.

Author

Hiroto Katsumata

Details

std_comp first extracts complete cases for both propensity score estimation and outcome model estimation. Then it standardizes covariates for propensity score estimation by takeing the provided weights into account. The returned data frame is the "design matrix", which contains a set of dummy variables (depending on the contrasts) for factors and similarly expanded interaction terms.

For the AO estimation, NA values for the outcome variable for missing cases (the response variable taking "0") are not deleted. For this processing, the outcome variable name must not contain spaces.

Examples

Run this code

# Simulation from Kang and Shafer (2007) and Imai and Ratkovic (2014)
# ATE estimation
# True ATE is 10
tau <- 10
set.seed(12345)
n <- 1000
X <- matrix(stats::rnorm(n * 4, mean = 0, sd = 1), nrow = n, ncol = 4)
prop <- 1 / (1 + exp(X[, 1] - 0.5 * X[, 2] + 
                     0.25 * X[, 3] + 0.1 * X[, 4]))
treat <- rbinom(n, 1, prop)
y <- 210 + 
     27.4 * X[, 1] + 13.7 * X[, 2] + 13.7 * X[, 3] + 13.7 * X[, 4] + 
     tau * treat + stats::rnorm(n = n, mean = 0, sd = 1)
ybinom <- (y > 210) + 0
df0 <- data.frame(X, treat, y, ybinom)
colnames(df0) <- c("x1", "x2", "x3", "x4", "treat", "y", "ybinom")

# Variables for a misspecified model
Xmis <- data.frame(x1mis = exp(X[, 1] / 2), 
                   x2mis = X[, 2] * (1 + exp(X[, 1]))^(-1) + 10,
                   x3mis = (X[, 1] * X[, 3] / 25 + 0.6)^3, 
                   x4mis = (X[, 2] + X[, 4] + 20)^2)

# Data frame and formulas for propensity score estimation
df <- data.frame(df0, Xmis)
formula_ps_c <- stats::as.formula(treat ~ x1 + x2 + x3 + x4)
formula_ps_m <- stats::as.formula(treat ~ x1mis + x2mis + 
                                          x3mis + x4mis)

# Formula for a misspecified outcome model
formula_y <- stats::as.formula(y ~ x1mis + x2mis + x3mis + x4mis)


# Distribution balancing weighting with regularization
# Standardization
res_std_comp <- std_comp(formula_y = formula_y, 
                         formula_ps = formula_ps_m, 
                         estimand = "ATE", method_y = "wls", 
                         data = df, std = TRUE,
                         weights = NULL)
fitdbwmr <- dbw(formula_y = formula_y, 
                formula_ps = res_std_comp$formula_ps, 
                estimand = "ATE", method = "dbw", method_y = "wls",
                data = res_std_comp$data, vcov = TRUE, 
                lambda = 0.01, weights = res_std_comp$weights, 
                clevel = 0.95)
summary(fitdbwmr)

Run the code above in your browser using DataLab