Performs generic ML inference for a single learning technique and a given split of the data. Can be seen as a single iteration of Algorithm 1 in the paper.
GenericML_single(
Z,
D,
Y,
learner,
propensity_scores,
M_set,
A_set = setdiff(1:length(Y), M_set),
Z_CLAN = NULL,
HT = FALSE,
quantile_cutoffs = c(0.25, 0.5, 0.75),
X1_BLP = setup_X1(),
X1_GATES = setup_X1(),
diff_GATES = setup_diff(),
diff_CLAN = setup_diff(),
vcov_BLP = setup_vcov(),
vcov_GATES = setup_vcov(),
equal_variances_CLAN = FALSE,
significance_level = 0.05,
min_variation = 1e-05
)
A numeric design matrix that holds the covariates in its columns.
A binary vector of treatment assignment. Value one denotes assignment to the treatment group and value zero assignment to the control group.
A numeric vector containing the response variable.
A character specifying the machine learner to be used for estimating the baseline conditional average (BCA) and conditional average treatment effect (CATE). Either 'lasso'
, 'random_forest'
, 'tree'
, or a custom learner specified with mlr3
syntax. In the latter case, do not specify in the mlr3
syntax specification if the learner is a regression learner or classification learner. Example: 'mlr3::lrn("ranger", num.trees = 100)'
for a random forest learner with 100 trees. Note that this is a string and the absence of the classif.
or regr.
keywords. See https://mlr3learners.mlr-org.com for a list of mlr3
learners.
A numeric vector of propensity score estimates.
A numerical vector of indices of observations in the main sample.
A numerical vector of indices of observations in the auxiliary sample. Default is complementary set to M_set
.
A numeric matrix holding variables on which classification analysis (CLAN) shall be performed. CLAN will be performed on each column of the matrix. If NULL
(default), then Z_CLAN = Z
, i.e. CLAN is performed for all variables in Z
.
Logical. If TRUE
, a Horvitz-Thompson (HT) transformation is applied in the BLP and GATES regressions. Default is FALSE
.
The cutoff points of the quantiles that shall be used for GATES grouping. Default is c(0.25, 0.5, 0.75)
, which corresponds to the four quartiles.
Same as X1_BLP
, just for the GATES regression.
Specifies the generic targets of GATES. Must be an object of class "setup_diff"
. See the documentation of setup_diff()
for details.
Same as diff_GATES
, just for the CLAN generic targets.
Specifies the covariance matrix estimator in the BLP regression. Must be an object of class "setup_vcov"
. See the documentation of setup_vcov()
for details.
Same as vcov_BLP
, just for the GATES regression.
Logical. If TRUE
, then all within-group variances of the CLAN groups are assumed to be equal. Default is FALSE
. This specification is required for heteroskedasticity-robust variance estimation on the difference of two CLAN generic targets (i.e. variance of the difference of two means). If TRUE
(corresponds to homoskedasticity assumption), the pooled variance is used. If FALSE
(heteroskedasticity), the variance of Welch's t-test is used.
Significance level for VEIN. Default is 0.05.
Specifies a threshold for the minimum variation of the BCA/CATE predictions. If the variation of a BCA/CATE prediction falls below this threshold, random noise with distribution \(N(0, var(Y)/20)\) is added to it. Default is 1e-05
.
A list with the following components:
BLP
An object of class "BLP"
.
GATES
An object of class "GATES"
.
CLAN
An object of class "CLAN"
.
proxy_BCA
An object of class "proxy_BCA"
.
proxy_CATE
An object of class "proxy_CATE"
.
best
Estimates of the \(\Lambda\) parameters for finding the best learner. Returned by lambda_parameters()
.
The specifications "lasso"
, "random_forest"
, and "tree"
in learner
correspond to the following mlr3
specifications (we omit the keywords classif.
and regr.
). "lasso"
is a cross-validated Lasso estimator, which corresponds to 'mlr3::lrn("cv_glmnet", s = "lambda.min", alpha = 1)'
. "random_forest"
is a random forest with 500 trees, which corresponds to 'mlr3::lrn("ranger", num.trees = 500)'
. "tree"
is a tree learner, which corresponds to 'mlr3::lrn("rpart")'
.
Chernozhukov V., Demirer M., Duflo E., Fern<U+00E1>ndez-Val I. (2020). “Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments.” arXiv preprint arXiv:1712.04802. URL: https://arxiv.org/abs/1712.04802.
Lang M., Binder M., Richter J., Schratz P., Pfisterer F., Coors S., Au Q., Casalicchio G., Kotthoff L., Bischl B. (2019). “mlr3: A Modern Object-Oriented Machine Learning Framework in R.” Journal of Open Source Software, 4(44), 1903. 10.21105/joss.01903.
# NOT RUN {
if(require("ranger")){
## generate data
set.seed(1)
n <- 150 # number of observations
p <- 5 # number of covariates
Z <- matrix(runif(n*p), n, p) # design matrix
D <- rbinom(n, 1, 0.5) # random treatment assignment
Y <- runif(n) # outcome variable
propensity_scores <- rep(0.5, n) # propensity scores
M_set <- sample(1:n, size = n/2) # main set
## specify learner
learner <- "mlr3::lrn('ranger', num.trees = 10)"
## run single GenericML iteration
GenericML_single(Z, D, Y, learner, propensity_scores, M_set)
}
# }
Run the code above in your browser using DataLab