bal: Construct Covariate Balance Statistics for Models with Multivariate Exposure

Description

Assessing balance between exposure(s) and confounders is key when performing causal analysis using propensity scores. We provide a list of several models to generate weights to use in causal inference for multivariate exposures, and test the balancing property of these weights using weighted Pearson correlations. In addition, returns the effective sample size.

Usage

bal(
  model_list,
  D,
  C,
  common = FALSE,
  trim_w = FALSE,
  trim_quantile = 0.99,
  all_uni = TRUE,
  ...
)

Arguments

model_list

character string identifying which methods to use when constructing weights. See details for a list of available models

numeric matrix of dimension $n$ by $m$ designating values of the exposures

either a list of numeric matrices of length $m$ of dimension $n$ by $p_j$ designating values of the confounders for each exposure value or if common is TRUE a single matrix of of dimension $n$ by $p$ that represents common confounders for all exposures.

common

logical indicator for whether C is a single matrix of common confounders for all exposures. default is FALSE meaning C must be specified as list of confounders of length $m$.

trim_w

logical indicator for whether to trim weights. default is FALSE

trim_quantile

numeric scalar used to specify the upper quantile to trim weights if applicable. default is 0.99

all_uni

logical indicator. If TRUE then all univariate models specified in model_list will be estimated for each exposure. If FALSE will only estimate weights for the first exposure

...

additional arguments to pass to weightit function if specifying one of these models in the model_list

Value

W: list of weights generated for each model
cor_list: list of weighted Pearson correlation coefficients for all confounders specified
bal_metrics: data.frame with the Euclidean distance, maximum absolute correlation, and average absolute correlation by method
ess: effective sample size for each of the methods used to generate weights
models: vector of models used

Details

When using propensity score methods for causal inference it is crucial to check the balancing property of the covariates and exposure(s). To do this in the multivariate case we first use a weight generating method from the available list shown below.

Methods Available

"mvGPS": Multivariate generalized propensity score using Gaussian densities
"entropy": Estimating weights using entropy loss function without specifying propensity score tbbicke2020entropymvGPS
"CBPS": Covariate balancing propensity score for continuous treatments which adds balance penalty while solving for propensity score parameters fong2018mvGPS
"PS": Generalized propensity score estimated using univariate Gaussian densities
"GBM": Gradient boosting to estimate the mean function of the propensity score, but still maintains Gaussian distributional assumptions zhu_boostingmvGPS

Note that only the mvGPS method is multivariate and all others are strictly univariate. For univariate methods weights are estimated for each exposure separately using the weightit function given the confounders for that exposure in C when all_uni=TRUE. To estimate weights for only the first exposure set all_uni=FALSE.

It is also important to note that the weights for each method can be trimmed at the desired quantile by setting trim_w=TRUE and setting trim_quantile in \[0.5, 1\]. Trimming is done at both the upper and lower bounds. For further details see mvGPS on how trimming is performed.

Balance Metrics

In this package we include three key balancing metrics to summarize balance across all of the exposures.

Euclidean distance
Maximum absolute correlation
Average absolute correlation

Euclidean distance is calculated using the origin point as reference, e.g. for m=2 exposures the reference point is \[0, 0\]. In this way we are calculating how far the observed set of correlation points are from perfect balance.

Maximum absolute correlation reports the largest single imbalance between the exposures and the set of confounders. It is often a key diagnostic as even a single confounder that is sufficiently out of balance can reduce performance.

Average absolute correlation is the sum of the exposure-confounder correlations. This metric summarizes how well, on average, the entire set of exposures is balanced.

Effective Sample Size

Effective sample size, ESS, is defined as $$ESS=(\Sigma_i w_i)^{2}/\Sigma_i w_i^2,$$ where $w_i$ are the estimated weights for a particular method kish_essmvGPS. Note that when $w=1$ for all units that the $ESS$ is equal to the sample size $n$. $ESS$ decreases when there are extreme weights or high variability in the weights.

References

Examples

Run this code

# NOT RUN {
#simulating data
sim_dt <- gen_D(method="u", n=150, rho_cond=0.2, s_d1_cond=2, s_d2_cond=2,
k=3, C_mu=rep(0, 3), C_cov=0.1, C_var=1, d1_beta=c(0.5, 1, 0),
d2_beta=c(0, 0.3, 0.75), seed=06112020)
D <- sim_dt$D
C <- sim_dt$C

#generating weights using mvGPS and potential univariate alternatives
require(WeightIt)
bal_sim <- bal(model_list=c("mvGPS", "entropy", "CBPS", "PS", "GBM"), D,
C=list(C[, 1:2], C[, 2:3]))

#overall summary statistics
bal_sim$bal_metrics

#effective sample sizes
bal_sim$ess

#we can also trim weights for all methods
bal_sim_trim <- bal(model_list=c("mvGPS", "entropy", "CBPS", "PS", "GBM"), D,
C=list(C[, 1:2], C[, 2:3]), trim_w=TRUE, trim_quantile=0.9, p.mean=0.5)
#note that in this case we can also pass additional arguments using in
#WeighIt package for entropy, CBPS, PS, and GBM such as specifying the p.mean

#can check to ensure all the weights have been properly trimmed at upper and
#lower bound
all.equal(unname(unlist(lapply(bal_sim$W, quantile, 0.99))),
unname(unlist(lapply(bal_sim_trim$W, max))))
all.equal(unname(unlist(lapply(bal_sim$W, quantile, 1-0.99))),
unname(unlist(lapply(bal_sim_trim$W, min))))

# }

Run the code above in your browser using DataLab