hbal: Hierarchically Regularized Entropy Balancing

Description

hbal performs hierarchically regularized entropy balancing such that the covariate distributions of the control group match those of the treatment group. hbal automatically expands the covariate space to include higher-order terms and uses cross-validation to select variable penalties for the balancing conditions.

Usage

hbal(data, Treat, X, Y = NULL, w = NULL, 
     X.expand = NULL, X.keep = NULL, 
     expand.degree = 1, coefs = NULL ,
     max.iterations = 200, cv = FALSE, folds = 4, 
     ds = FALSE, group.exact = NULL, 
     group.alpha = NULL, term.alpha = NULL,
     constraint.tolerance = 1e-3, print.level = 0, 
     grouping = NULL, group.labs = NULL,
     shuffle.treat = TRUE, exclude = NULL, 
     force = FALSE, seed=NULL)

Value

A list object of class hbal with the following elements:

converged: a logical flag indicating whether the algorithm has converged.
weights: a vector that contains the treatment and control group weights assigned by hbal. The treatment group weights are from the base weithgs
weights.co: a vector that contains the control group weights only assigned by hbal.
coefs: a vector that contains coefficients from the reweighting algorithm.
mat: a matrix of serially expanded covariates if expand=TRUE. Otherwise, the original covariate matrix is returned.
grouping: a vector indicating different groupings of the covariates.
group.penalty: a vector that stores ridge penalty for each group.
term.penalty: a vector that stores ridge penalty for each covariate.
bal.tab: a balance table.
Treat: a vector of treatment status.
base.weights: a vector that saves the base weights.

Arguments

data: a dataframe that contains the treatment, outcome, and covariates.
Treat: a character string of the treatment variable.
X: a character vector of covariate names to balance on.
Y: a character string of the outcome variable.
w: a character string indicating the variable that stores base weights.
X.expand: a character vector of covariate names for serial expansion.
X.keep: a character vector of covariate names to keep regardless of whether they are selected in double selection.
expand.degree: degree of series expansion. The default is 1, which means no expansion.
coefs: initial coefficients for the reweighting algorithm (lambdas).
max.iterations: maximum number of iterations. The default is 200.
cv: whether to use cross-validation. The default is TRUE.
folds: number of folds for cross-validation. Only used when cv is TRUE.
ds: whether to perform double selection prior to balancing. Default is FALSE.
group.exact: binary indicator of whether each covariate group should be exactly balanced.
group.alpha: penalty for each covariate group
term.alpha: named vector of ridge penalties, only takes 0 or 1.
constraint.tolerance: tolerance level for overall imbalance. Default is 1e-3.
print.level: details of printed output: -1 for none, 0 for minimum (default), 1 for detailed.
grouping: a vector indicating different groupings of the covariates.
group.labs: labels for user-supplied groups
shuffle.treat: whether to use cross-validation on the treated units. The default is TRUE.
exclude: list of covariate name pairs or triplets to be excluded.
force: a logical flag indicating whether to expand covariates when there are too many of them.
seed: random seed to be set. Set random seed when cv=TRUE for reproducibility.

Author

Yiqing Xu, Eddie Yang

Yiqing Xu <yiqingxu@stanford.edu>, Eddie Yang <z5yang@ucsd.edu>

Details

In the simplest set-up, users can just pass in {Treatment, X, Y}. The default settings will serially expand X to include higher-order terms, hierarchically residualize these terms, and use cross-validation to select penalties for different groups of the covariates.

References

Xu, Y., & Yang, E. (2022). Hierarchically Regularized Entropy Balancing. Political Analysis, 1-8. doi:10.1017/pan.2022.12

Examples

Run this code

# Example 1
set.seed(1984)
N <- 500
X1 <- rnorm(N)
X2 <- rbinom(N,size=1,prob=.5)
X <- cbind(X1, X2)
treat <- rbinom(N, 1, prob=0.5) # Treatment indicator
y <- 0.5 * treat + X[,1] + X[,2] + rnorm(N) # Outcome
dat <- data.frame(treat=treat, X, Y=y)
out <- hbal(data=dat, Treat = 'treat', X = c('X1', 'X2'), Y = 'Y')
att(out)

# Example 2
## Simulation from Kang and Shafer (2007).
library(MASS)
set.seed(1984)
n <- 500
X <- mvrnorm(n, mu = rep(0, 4), Sigma = diag(4))
prop <- 1 / (1 + exp(X[,1] - 0.5 * X[,2] + 0.25*X[,3] + 0.1 * X[,4]))
# Treatment indicator
treat <- rbinom(n, 1, prop)
# Outcome
y <- 210 + 27.4*X[,1] + 13.7*X[,2] + 13.7*X[,3] + 13.7*X[,4] + rnorm(n)
# Observed covariates
X.mis <- cbind(exp(X[,1]/2), X[,2]*(1+exp(X[,1]))^(-1)+10, 
    (X[,1]*X[,3]/25+.6)^3, (X[,2]+X[,4]+20)^2)
dat <- data.frame(treat=treat, X.mis, Y=y)
out <- hbal(data=dat, Treat = 'treat', X = c('X1', 'X2', 'X3', 'X4'), Y='Y')
att(out)

Run the code above in your browser using DataLab