- formula
a formula, with a response and features but
no interaction terms. If this a a data frome, that is taken as
the model frame (see model.frame).
- data
an optional data frame that includes the variables
named in the formula.
- weights
optional case weights.
- treatment
a vector that indicates the treatment status of
each observation. 1 represents treated and 0 represents control.
Only binary treatment supported in this version.
- subset
optional expression saying that only a subset of the
rows of the data should be used in the fit.
- na.action
the default action deletes all observations for which
y
is missing, but keeps those in which one or more predictors
are missing.
- split.Rule
causalTree splitting options, one of "TOT"
,
"CT"
, "fit"
, "tstats"
, four splitting rules in
causalTree
. Note that the "tstats"
alternative does
not have an associated cross-validation method cv.option
;
see Athey and Imbens (2016) for a discussion. Note further
that split.Rule
and cv.option
can mix and match.
- split.Honest
boolean option, TRUE
or FALSE
,
used for split.Rule
as "CT"
or "fit"
.
If set as TRUE
, do honest splitting, with default
split.alpha
= 0.5; if set as FALSE
, do adaptive
splitting with split.alpha
= 1. The user choice of
split.alpha
will be ignored if split.Honest
is set
as FALSE
, but will be respected
if set to TRUE
. For split.Rule
="TOT"
,
there is no honest splitting option and
the parameter split.alpha
does not matter.
For split.Rule
="tstats"
, a value of TRUE
enables use of split.alpha
in calculating the risk function,
which determines the order of pruning in cross-validation. Note also
that causalTree function returns the estimates from the training data,
no matter what the value of split.Honest
is; the tree must be
re-estimated to get the honest estimates using estimate.causalTree
.
The wrapper function honest.CausalTree
does honest estimation in one step and returns a tree.
- HonestSampleSize
number of observations anticipated to be used
in honest re-estimation after building the tree. This enters the
risk function used in both splitting and cross-validation.
- split.Bucket
boolean option, TRUE
or FALSE
,
used to specify whether to apply the discrete method in splitting the tree.
If set as TRUE
, in splitting a node, the observations in a leaf will
be be partitioned into buckets, with each bucket containing bucketNum
treated and bucketNum
control units, and where observations are
ordered prior to partitioning. Splitting will take place by bucket.
- bucketNum
number of observations in each bucket when set
split.Bucket
= TRUE
. However, the code will override
this choice in order to guarantee that there are at least minsize
and at most bucketMax
buckets.
- bucketMax
Option to choose maximum number of buckets to use in
splitting when set split.Bucket
= TRUE
,
bucketNum
can change by choice of bucketMax
.
- cv.option
cross validation options, one of "TOT"
,
"matching"
, "CT"
, "fit"
, four cross validation
methods in causalTree. There is no cv.option
for the
split.Rule
"tstats"
; see Athey and Imbens (2016) for
discussion.
- cv.Honest
boolean option, TRUE
or FALSE
,
only used for cv.option
as "CT"
or "fit"
,
to specify whether to apply honest risk evalation function in cross
validation. If set TRUE
, use honest risk function, otherwise use
adaptive risk function in cross validation. If set FALSE
, the user
choice of cv.alpha
will be set to 1. If set TRUE
,
cv.alpha
will default to 0.5, but the user choice of
cv.alpha
will be respected. Note that honest cv estimates
within-leaf variances and may perform better with larger leaf sizes
and/or small number of cross-validation sets.
- minsize
in order to split, each leaf must have at least
minsize
treated cases and minsize
control cases.
The default value is set as 2.
- x
keep a copy of the x
matrix in the result.
- y
keep a copy of the dependent variable in the result. If
missing and model
is supplied this defaults to FALSE
.
- propensity
propensity score used in "TOT"
splitting
and "TOT"
, honest "CT"
cross validation methods. The
default value is the proportion of treated cases in all observations.
In this implementation, the propensity score is a constant for the whole
dataset. Unit-specific propensity scores are not supported; however,
the user may use inverse propensity scores as case weights if desired.
- control
a list of options that control details of the
rpart
algorithm. See rpart.control
.
- split.alpha
scale parameter between 0 and 1, used in splitting
risk evaluation function for "CT"
. When split.Honest = FALSE
,
split.alpha
will be set as 1. For split.Rule
="tstats"
,
if split.Honest
=TRUE
, split.alpha
is used in
calculating the risk function, which determines the order of pruning
in cross-validation.
- cv.alpha
scale paramter between 0 and 1, used in cross validation
risk evaluation function for "CT"
and "fit"
. When
cv.Honest = FALSE
, cv.alpha
will be set as 1.
- cv.gamma, split.gamma
optional parameters used in evaluating policies.
- cost
a vector of non-negative costs, one for each variable in
the model. Defaults to one for all variables. These are scalings to
be applied when considering splits, so the improvement on splitting
on a variable is divided by its cost in deciding which split to
choose.
- ...
arguments to rpart.control
may also be
specified in the call to causalTree
. They are checked against the
list of valid arguments. An example of a commonly set parameter would
be xval
, which sets the number of cross-validation samples.
The parameter minsize
is implemented differently in
causalTree
than in rpart
; we require a minimum of minsize
treated observations and a minimum of minsize
control
observations in each leaf.