Fits a collection of treatment and response models using the Bayesian Additive Regression Trees (BART) algorithm, producing estimates of treatment effects.
bartc(response, treatment, confounders, parametric, data, subset, weights,
method.rsp = c("bart", "tmle", "p.weight"),
method.trt = c("bart", "glm", "none"),
estimand = c("ate", "att", "atc"),
group.by = NULL,
commonSup.rule = c("none", "sd", "chisq"),
commonSup.cut = c(NA_real_, 1, 0.05),
args.rsp = list(), args.trt = list(),
p.scoreAsCovariate = TRUE, use.ranef = TRUE, group.effects = FALSE,
crossvalidate = FALSE,
keepCall = TRUE, verbose = TRUE,
seed = NA_integer_,
...)bartc returns an object of class bartcFit. Information about the
object can be derived by using methods summary,
plot_sigma, plot_est, plot_indiv,
and plot_support. Numerical quantities are recovered with the
fitted and
extract generics. Predictions for new
observations are obtained with predict.
Objects of class bartcFit are lists containing items:
method.rspcharacter string specifying the method used to fit the response surface
method.trtcharacter string specifying the method used to fit the treatment assignment mechanism
estimandcharacter string specifying the targeted causal effect
fit.rspobject containing the fitted response model
data.rspdbartsData object used when fitting the
response model
fit.trtobject containing the fitted treatment model
group.byoptional factor vector containing the groups in which treatment effects are estimated
estmatrix or array of posterior samples of the treatment effect estimate
p.scorethe vector of propensity scores used as a covariate in the response model, when applicable
samples.p.scorematrix or array of posterior samples of the propensity score, when applicable
mu.hat.obssamples from the posterior of the expected value for individual responses under the observed treatment regime
mu.hat.cfsamples from the posterior of the expected value for individual responses under the counterfactual treatment
name.trtcharacter string giving the name of the treatment
variable in the data of fit.rsp
trtvector of treatment assignments
callhow bartc was called
n.chainsnumber of independent posterior sampler chains in response model
commonSup.rulecommon support rule used for suppressing observations
commonSup.cutcommon support parameter used to set cutoff when suppressing observations
sd.obsvector of standard deviations of individual posterior predictors for observed treatment conditions
sd.cfvector of standard deviations of individual posterior predictors for counterfactuals
commonSup.sublogical vector expressing which observations are used when estimating treatment effects
use.raneflogical for whether ranef models were used; only added when true
group.effectslogical for whether group-level estimates are made; only added when true
seeda random seed for use when drawing from the posterior predictive distribution
A vector of the outcome variable, or a reference to such in the data
argument. Can be continuous or binary.
A vector of the binary treatment variable, or a reference to data.
A matrix or data frame of covariates to be used in estimating the treatment
and response model. Can also be the right-hand-side of a formula (e.g.
x1 + x2 + ...). The data argument will be searched if
supplied.
The right-hand-side of a formula (e.g. x1 + x2 + (1 | g) ...) giving the
equation of a parametric form to be used for estimating the mean structure.
See the details section below.
An optional data frame or named list containing the response,
treatment, and confounders.
An optional vector using to subset the data. Can refer to data if
provided.
An optional vector of population weights used in model fitting and
estimating the treatment effect. Can refer to data if provided.
A character string specifying which method to use when fitting the response
surface and estimating the treatment effect. Options are: "bart" -
fit the response surface with BART and take the average of the individual
treatment effect estimates, "p.weight" - fit the response surface
with BART but compute the treatment effect estimate by using a propensity
score weighted sum of individual effects, and "tmle" - as above, but
further adjust the individual estimates using the Targeted Minimum Loss
based Estimation (TMLE) adjustment.
A character string specifying which method to use when fitting the treatment
assignment mechanism, or a vector/matrix of propensity scores. Character
string options are: "bart" - fit BART directly to the treatment
variable, "glm" - fit a generalized linear model with a binomial
response and all confounders added linearly, and "none" - do no
propensity score estimation. Cannot be "none" if the response model
requires propensity scores. When supplied as a matrix, it should be of
dimensions equal to the number of observations times the number of samples
used in any response model.
A character string specifying which causal effect to target. Options are
"ate" - average treatment effect, "att" - average treatment
effect on the treated, and "atc" - average treatment effect on the
controls.
An optional factor that, when present, causes the treatment effect estimate to be calculated within each group.
Rule for exclusion of observations lacking in common support. Options are
"none" - no suppression, "sd" - exclude units whose predicted
counterfactual standard deviation is extreme compared to the maximum
standard deviation under those units' observed treatment condition, where
extreme refers to the distribution of all standard deviations of observed
treatment conditions, "chisq" - exclude observations according to
ratio of the variance of posterior predicted counterfactual to the posterior
variance of the observed condition, having a Chi Squared distribution with
one degree of freedom under the null hypothesis of have equal distributions.
Cutoffs for commonSup.rule. Ignored for "none", when
commonSup.rule is "sd", refers to how many standard deviations
of the distribution of posterior variance for counterfactuals an observation
can be above the maximum of posterior variances for that treatment
condition. When commonSup.rule is "chisq", is the \(p\)
value used for rejection of the hypothesis of equal variances.
A logical such that when TRUE, the propensity score is added to the
response model as a covariate. When used, this is equivalent to the
'ps-BART' method described by described by Hahn, Murray, and Carvalho.
Logical specifying if group.by variable - when present - should be
included as a "random" or "fixed" effect. If true,
rbart will be used for BART models. Using random
effects for treatment assignment mechanisms of type "glm" require
that the lme4 package be available.
Logical specifying if effects should be calculated within groups if the
group.by variable is provided. Response methods of "tmle" and
"p.weight" are such that if group effects are calculated, then the
population effect is not provided.
A logical such that when FALSE, the call to bartc is not kept.
This can reduce the amount of information printed by
summary when passing in data as literals.
One of TRUE, FALSE, "trt", or "rsp". Enables
code to attempt to estimate the optimal end-node sensitivity parameter. This
uses a rudimentary Bayesian optimization routine and can be extremely slow.
A logical that when TRUE prints information as the model is fit.
Optional integer specifying the desired pRNG seed. It should not be needed
when running single-threaded - set.seed will suffice, and can be used to
obtain reproducible results when multi-threaded. See Reproducibility section of
bart2.
Further arguments to the treatment and response model fitting algorithms.
Arguments passed to the main function as ... will be used in both models.
args.rsp and args.trt can be used to set parameters in a
single fit, and will override other values. See glm and
bart2 for reference.
Vincent Dorie: vdorie@gmail.com.
bartc represents a collection of methods that primarily use the
Bayesian Additive Regression Trees (BART) algorithm to estimate causal
treatment effects with binary treatment variables and continuous or binary
outcomes. This requires models to be fit to the response surface (distribution
of the response as a function of treatment and confounders,
\(p(Y(1), Y(0) | X)\) and optionally for treatment assignment mechanism
(probability of receiving treatment, i.e. propensity score,
\(Pr(Z = 1 | X)\)). The response surface model is used to impute
counterfactuals, which may then be adjusted together with the propensity score
to produce estimates of effects.
Similar to lm, models can be specified symbolically. When the
data term is present, it will be added to the search path for the
response, treatment, and confounders variables. The
confounders must be specified devoid of any "left hand side", as they appear
in both of the models.
Response Surface
The response surface methods included are:
"bart" - use BART to fit the response surface and produce
individual estimates \(\hat{Y}(1)_i\) and
\(\hat{Y}(0)_i\). Treatment effect estimates are
obtained by averaging the difference of these across the population of
interest.
"p.weight" - individual effects are estimated as in
"bart", but treatment effect estimates are obtained by using a
propensity score weighted average. For the average treatment effect on
the treated, these weights are \(p(z_i | x_i) / (\sum z / n)\). For
ATC, replace \(z\) with \(1 - z\). For ATE, "p.weight" is
equal to "bart".
"tmle" - individual effects are estimated as in "bart"
and a weighted average is taken as in "p.weight", however the
response surface estimates and propensity scores are corrected by
using the Targeted Minimum Loss based Estimation method.
Treatment Assignment
The treatment assignment models are:
"bart" - fit a binary BART directly to the treatment using all
the confounders.
"none" - no modeling is done. Only applies when using response
method "bart" and p.scoreAsCovariate is FALSE.
"glm" - fit a binomial generalized linear model with logistic
link and confounders included as linear terms.
Finally, a vector or matrix of propensity scores can be supplied. Propensity score matrices should have a number of rows equal to the number of observations in the data and a number of columns equal to the number of posterior samples.
Parametrics
bartc uses the stan4bart package, when available, to fit semi-
parametric surfaces. Equations can be given as to lm. Grouping
structures are also permitted, and use the syntax of lmer.
Generics
For a fitted model, the easiest way to analyze the resulting fit is to use the
generics fitted, extract, and
predict to analyze specific quantities and
summary to aggregate those values into
targets (e.g. ATE, ATT, or ATC).
Common Support Rules
Common support, or that the probability of receiving all treatment conditions
is non-zero within every area of the covariate space
(\(P(Z = 1 | X = x) > 0\) for all \(x\) in the inferential sample), can be
enforced by excluding observations with high posterior uncertainty.
bartc supports two common support rules through commonSup.rule
argument:
"sd" - observations are cut from the inferential sample if:
\(s_i^{f(1-z)} > m_z + a \times sd(s_j^{f(z)}\),
where \(s_i^{f(1-z)}\) is the posteriors standard
deviation of the predicted counterfactual for observation \(i\),
\(s_j^f(z)\) is the posterior standard deviation of the prediction
for the observed treatment condition of observation \(j\),
\(sd(s_j^{f(z)}\) is the empirical standard deviation
of those quantities, and
\(m_z = max_j \{s_j^{f(z)}\}\) for all \(j\)
in the same treatment group, i.e. \(Z_j = z\). \(a\) is a constant
to be passed in using commonSup.cut and its default is 1.
"chisq" - observations are cut from the inferential sample if:
\((s_i^{f(1-z)} / s_i^{f(z)})^2 > q_\alpha\),
where \(s_i\) are as above and \(q_\alpha\), is the upper
\(\alpha\) percentile of a \(\chi^2\) distribution with one degree
of freedom, corresponding to a null hypothesis of equal variance. The
default for \(\alpha\) is 0.05, and it is specified using the
commonSup.cut parameter.
Special Arguments
Some default arguments are unconventional or are passed in a unique fashion.
If n.chains is missing, unlike in bart2 a default
of 10 is used.
For method.rsp == "tmle", a special arg.trt of
posteriorOfTMLE determines if the TMLE correction should be
applied to each posterior sample (TRUE), or just the posterior
mean (FALSE).
Missing Data
Missingness is allowed only in the response. If some response values are
NA, the BART models will be trained just for where data are available
and those values will be used to make predictions for the missing
observations. Missing observations are not used when calculating statistics
for assessing common support, although they may still be excluded on those
grounds. Further, missing observations may not be compatible with response
method "tmle".
Chipman, H., George, E. and McCulloch R. (2010) BART: Bayesian additive regression trees. The Annals of Applied Statistics 4(1), 266--298. The Institute of Mathematical Statistics. tools:::Rd_expr_doi("10.1214/09-AOAS285").
Hill, J. L. (2011) Bayesian Nonparametric Modeling for Causal Inference. Journal of Computational and Graphical Statistics 20(1), 217--240. Taylor & Francis. tools:::Rd_expr_doi("10.1198/jcgs.2010.08162").
Hill, J. L. and Su Y. S. (2013) Assessing Lack of Common Support in Causal Inference Using Bayesian Nonparametrics: Implications for Evaluating the Effect of Breastfeeding on Children's Cognitive Outcomes The Annals of Applied Statistics 7(3), 1386--1420. The Institute of Mathematical Statistics. tools:::Rd_expr_doi("10.1214/13-AOAS630").
Carnegie, N. B. (2019) Comment: Contributions of Model Features to BART Causal Inference Performance Using ACIC 2016 Competition Data Statistical Science 34(1), 90--93. The Institute of Mathematical Statistics. tools:::Rd_expr_doi("10.1214/18-STS682")
Hahn, P. R., Murray, J. S., and Carvalho, C. M. (2020) Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects (with Discussion). Bayesian Analysis 15(3), 965--1056. International Society for Bayesian Analysis. tools:::Rd_expr_doi("10.1214/19-BA1195").
bart2
## fit a simple linear model
n <- 100L
beta.z <- c(.75, -0.5, 0.25)
beta.y <- c(.5, 1.0, -1.5)
sigma <- 2
set.seed(725)
x <- matrix(rnorm(3 * n), n, 3)
tau <- rgamma(1L, 0.25 * 16 * rgamma(1L, 1 * 32, 32), 16)
p.score <- pnorm(x %*% beta.z)
z <- rbinom(n, 1, p.score)
mu.0 <- x %*% beta.y
mu.1 <- x %*% beta.y + tau
y <- mu.0 * (1 - z) + mu.1 * z + rnorm(n, 0, sigma)
# low parameters only for example
fit <- bartc(y, z, x, n.samples = 100L, n.burn = 15L, n.chains = 2L)
summary(fit)
## example to show refitting under the common support rule
fit2 <- refit(fit, commonSup.rule = "sd")
fit3 <- bartc(y, z, x, subset = fit2$commonSup.sub,
n.samples = 100L, n.burn = 15L, n.chains = 2L)
Run the code above in your browser using DataLab