roseRF_pliv: ROSE random forest estimator for the partially linear instrumental variable model

Description

ROSE random forest estimator for the partially linear instrumental variable model

Usage

roseRF_pliv(
  y_formula,
  y_learner,
  y_pars = list(),
  x_formula,
  x_learner,
  x_pars = list(),
  IV1_formula = NA,
  IV1_learner = NA,
  IV1_pars = list(),
  IV2_formula = NA,
  IV2_learner = NA,
  IV2_pars = list(),
  IV3_formula = NA,
  IV3_learner = NA,
  IV3_pars = list(),
  IV4_formula = NA,
  IV4_learner = NA,
  IV4_pars = list(),
  IV5_formula = NA,
  IV5_learner = NA,
  IV5_pars = list(),
  data,
  K = 5,
  S = 1,
  max.depth = 10,
  num.trees = 500,
  min.node.size = max(10, ceiling(0.01 * (K - 1)/K * nrow(data))),
  replace = TRUE,
  sample.fraction = 0.8
)

Value

A list containing:

theta: The estimator of \(\theta_0\).
stderror: Huber robust estimate of the standard error of the \(\theta_0\)-estimator.
coefficients: Table of \(\theta_0\) coefficient estimator, standard error, z-value and p-value.

Arguments

y_formula: a two-sided formula object describing the regression model for \(\mathbb{E}[Y|Z]\).
y_learner: a string specifying the regression method to fit the regression of \(Y\) on \(Z\) as given by y_formula (e.g. randomforest, xgboost, neuralnet, gam).
y_pars: a list containing hyperparameters for the y_learner chosen. Default is an empty list, which performs hyperparameter tuning.
x_formula: a two-sided formula object describing the regression model for \(\mathbb{E}[X|Z]\).
x_learner: a string specifying the regression method to fit the regression of \(X\) on \(Z\) as given by x_formula (e.g. randomforest, xgboost, neuralnet, gam).
x_pars: a list containing hyperparameters for the y_learner chosen. Default is an empty list, which performs hyperparameter tuning.
IV1_formula: a two-sided formula object for the model \(\mathbb{E}[V_1(X)|Z]\).
IV1_learner: a string specifying the regression method for \(\mathbb{E}[V_1(X)|Z]\) estimation.
IV1_pars: a list containing hyperparameters for the IV1_learner chosen.
IV2_formula: a two-sided formula object for the model \(\mathbb{E}[V_2|Z]\). Default is no formula / regression (i.e. \(J=1\))
IV2_learner: a string specifying the regression method for \(\mathbb{E}[V_2(X)|Z]\) estimation.
IV2_pars: a list containing hyperparameters for the IV2_learner chosen.
IV3_formula: a two-sided formula object for the model \(\mathbb{E}[V_3(X)|Z]\). Default is no formula / regression (i.e. \(J=1\)).
IV3_learner: a string specifying the regression method for \(\mathbb{E}[V_3(X)|Z]\) estimation.
IV3_pars: a list containing hyperparameters for the IV3_learner chosen.
IV4_formula: a two-sided formula object for the model \(\mathbb{E}[V_4(X)|Z]\). Default is no formula / regression (i.e. \(J=1\))
IV4_learner: a string specifying the regression method for \(\mathbb{E}[V_4(X)|Z]\) estimation.
IV4_pars: a list containing hyperparameters for the IV4_learner chosen.
IV5_formula: a two-sided formula object for the model \(\mathbb{E}[V_5(X)|Z]\). Default is no formula / regression (i.e. \(J=1\))
IV5_learner: a string specifying the regression method for \(\mathbb{E}[V_5(X)|Z]\) estimation.
IV5_pars: a list containing hyperparameters for the IV5_learner chosen.
data: a data frame containing the variables for the partially linear model.
K: the number of folds used for \(K\)-fold cross-fitting. Default is 5.
S: the number of repeats to mitigate the randomness in the estimator on the sample splits used for \(K\)-fold cross-fitting. Default is 5.
max.depth: Maximum depth parameter used for ROSE random forests. Default is 5.
num.trees: Number of trees used for a single ROSE random forest. Default is 50.
min.node.size: Minimum node size of a leaf in each tree. Default is max(10,ceiling(0.01 (K-1)/K nrow(data))).
replace: Whether sampling for a single random tree are performed with (bootstrap) or without replacement. Default is TRUE (i.e. bootstrap).
sample.fraction: Proportion of data used for each random tree. Default is 0.8.