bootstrap: Bootstrap Lasso Implementation (in development)

Description

This function performs standard plugin lasso PPML estimation for bootreps samples drawn again with replacement and reports those regressors selected in at least a certain fraction of the bootstrap repetitions.

Usage

bootstrap(
  data,
  dep,
  indep = NULL,
  cluster_id = NULL,
  fixed = NULL,
  selectobs = NULL,
  bootreps = 250,
  boot_threshold = 0.01,
  colcheck_x = FALSE,
  colcheck_x_fes = FALSE,
  post = FALSE,
  gamma_val = NULL,
  verbose = FALSE,
  tol = 1e-06,
  hdfetol = 0.01,
  penweights = NULL,
  maxiter = 1000,
  phipost = TRUE
)

Value

A matrix with coefficient estimates for all dependent variables.

Arguments

data: A data frame containing all relevant variables.
dep: A string with the names of the independent variables or their column numbers.
indep: A vector with the names or column numbers of the regressors. If left unspecified, all remaining variables (excluding fixed effects) are included in the regressor matrix.
cluster_id: A string denoting the cluster-id with which to perform cluster bootstrap.
fixed: A vector with the names or column numbers of factor variables identifying the fixed effects, or a list with the desired interactions between variables in data.
selectobs: Optional. A vector indicating which observations to use (either a logical vector or a numeric vector with row numbers, as usual when subsetting in R).
bootreps: Number of bootstrap repetitions.
boot_threshold: Minimal threshold. If a variable is selected in at least this fraction of times, it is reported at the end of the iterations.
colcheck_x: Logical. If TRUE, this checks collinearity between the independent variables and drops the collinear variables.
colcheck_x_fes: Logical. If TRUE, this checks whether the independent variables are perfectly explained by the fixed effects drops those that are perfectly explained.
post: Logical. If TRUE, estimates a post-penalty regression with the selected variables.
gamma_val: Numerical value that determines the regularization threshold as defined in Belloni, Chernozhukov, Hansen, and Kozbur (2016). NULL default sets parameter to 0.1/log(n).
verbose: Logical. If TRUE, it prints information to the screen while evaluating.
tol: Tolerance parameter for convergence of the IRLS algorithm.
hdfetol: Tolerance parameter for the within-transformation step, passed on to collapse::fhdwithin.
penweights: Optional: a vector of coefficient-specific penalties to use in plugin lasso when method == "plugin".
maxiter: Maximum number of iterations (a number).
phipost: Logical. If TRUE, the plugin coefficient-specific penalty weights are iteratively calculated using estimates from a post-penalty regression. Otherwise, these are calculated using estimates from a penalty regression.

Details

This function enables users to implement the "bootstrap" step in the procedure described in Breinlich, Corradi, Rocha, Ruta, Santos Silva and Zylkin (2020). To do this, Plugin Lasso is run B times. The function can also perform a post-selection estimation.

References

Breinlich, H., Corradi, V., Rocha, N., Ruta, M., Santos Silva, J.M.C. and T. Zylkin (2021). "Machine Learning in International Trade Research: Evaluating the Impact of Trade Agreements", Policy Research Working Paper; No. 9629. World Bank, Washington, DC.

Correia, S., P. Guimaraes and T. Zylkin (2020). "Fast Poisson estimation with high dimensional fixed effects", STATA Journal, 20, 90-115.

Gaure, S (2013). "OLS with multiple high dimensional category variables", Computational Statistics & Data Analysis, 66, 8-18.

Friedman, J., T. Hastie, and R. Tibshirani (2010). "Regularization paths for generalized linear models via coordinate descent", Journal of Statistical Software, 33, 1-22.

Belloni, A., V. Chernozhukov, C. Hansen and D. Kozbur (2016). "Inference in high dimensional panel models with an application to gun control", Journal of Business & Economic Statistics, 34, 590-605.

Examples

Run this code

if (FALSE) bs1 <- bootstrap(data=trade3, dep="export",
                 cluster_id="clus",
                 fixed=list(c("exp", "time"),
                 c("imp", "time"), c("exp", "imp")),
                 indep=7:22, bootreps=10, colcheck_x = TRUE,
                 colcheck_x_fes = TRUE,
                 boot_threshold = 0.01,
                 post=TRUE, gamma_val=0.01, verbose=FALSE)

Run the code above in your browser using DataLab