mlfitppml_int
is the internal wrapper called by mlfitppml
for penalized PPML estimation.
This in turn calls penhdfeppml_int
, penhdfeppml_cluster_int
and hdfeppml_int
as needed. It takes a vector with the dependent variable, a regressor matrix and a set of fixed
effects (in list form: each element in the list should be a separate HDFE). This is a flexible tool
that allows users to select:
Penalty type: either lasso or ridge.
Penalty parameter: users can provide a single global value for lambda (a single regression is estimated), a vector of lambda values (the function estimates the regression using each of them, sequentially) or even coefficient-specific penalty weights.
Method: plugin lasso estimates can be obtained directly from this function too.
Cross-validation: if this option is enabled, the function uses IDs provided by the user to perform k-fold cross-validation and reports the resulting RMSE for all lambda values.
mlfitppml_int(
y,
x,
fes,
lambdas,
penalty = "lasso",
tol = 1e-08,
hdfetol = 1e-04,
colcheck = TRUE,
colcheck_x = colcheck,
colcheck_x_fes = colcheck,
post = TRUE,
cluster = NULL,
method = "bic",
IDs = 1:n,
verbose = FALSE,
xval = FALSE,
standardize = TRUE,
vcv = TRUE,
phipost = TRUE,
penweights = NULL,
K = 15,
gamma_val = NULL,
mu = NULL
)
A list with the following elements:
beta
: if post = FALSE
, a length(lambdas)
x ncol(x)
matrix with
coefficient (beta) estimates from the penalized regressions. If post = TRUE
, this is
the matrix of coefficients from the post-penalty regressions.
beta_pre
: if post = TRUE
, a length(lambdas)
x ncol(x)
matrix with
coefficient (beta) estimates from the penalized regressions.
bic
: Bayesian Information Criterion.
lambdas
: vector of penalty parameters.
ses
: standard errors of the coefficients of the post-penalty regression. Note that
these are only provided when post = TRUE
.
rmse
: if xval = TRUE
, a matrix with the root mean squared error (RMSE - column 2)
for each value of lambda (column 1), obtained by cross-validation.
phi
: coefficient-specific penalty weights (only if method == "plugin"
).
Dependent variable (a vector)
Regressor matrix.
List of fixed effects.
Vector of penalty parameters.
A string indicating the penalty type. Currently supported: "lasso" and "ridge".
Tolerance parameter for convergence of the IRLS algorithm.
Tolerance parameter for the within-transformation step,
passed on to collapse::fhdwithin
.
Logical. If TRUE
, performs both checks in colcheck_x
and colcheck_x_fes
.
If the user specifies colcheck_x
and colcheck_x_fes
individually, this option is overwritten.
Logical. If TRUE
, this checks collinearity between the independent variables and drops the
collinear variables.
Logical. If TRUE
, this checks whether the independent variables are perfectly explained
by the fixed effects drops those that are perfectly explained.
Logical. If TRUE
, estimates a post-penalty regression with the selected variables.
Optional: a vector classifying observations into clusters (to use when calculating SEs).
The user can set this equal to "plugin" to perform the plugin algorithm with coefficient-specific penalty weights (see details). Otherwise, a single global penalty is used.
A vector of fold IDs for k-fold cross validation. If left unspecified, each observation is assigned to a different fold (warning: this is likely to be very resource-intensive).
Logical. If TRUE
, it prints information to the screen while evaluating.
Logical. If TRUE
, it carries out cross-validation.
Logical. If TRUE
, x variables are standardized before estimation.
Logical. If TRUE
(the default), the post-estimation model includes standard errors.
Logical. If TRUE
, the plugin coefficient-specific penalty weights are iteratively
calculated using estimates from a post-penalty regression when method == "plugin"
. Otherwise,
these are calculated using estimates from a penalty regression.
Optional: a vector of coefficient-specific penalties to use in plugin lasso when
method == "plugin"
.
Maximum number of iterations for the plugin algorithm to converge.
Numerical value that determines the regularization threshold as defined in Belloni, Chernozhukov, Hansen, and Kozbur (2016). NULL default sets parameter to 0.1/log(n).
A vector of initial values for mu that can be passed to the command.
For technical details on the algorithms used, see hdfeppml_int (post-lasso regression), penhdfeppml_int (standard penalized regression), penhdfeppml_cluster_int (plugin lasso), and xvalidate (cross-validation).
Breinlich, H., Corradi, V., Rocha, N., Ruta, M., Santos Silva, J.M.C. and T. Zylkin (2021). "Machine Learning in International Trade Research: Evaluating the Impact of Trade Agreements", Policy Research Working Paper; No. 9629. World Bank, Washington, DC.
Correia, S., P. Guimaraes and T. Zylkin (2020). "Fast Poisson estimation with high dimensional fixed effects", STATA Journal, 20, 90-115.
Gaure, S (2013). "OLS with multiple high dimensional category variables", Computational Statistics & Data Analysis, 66, 8-18.
Friedman, J., T. Hastie, and R. Tibshirani (2010). "Regularization paths for generalized linear models via coordinate descent", Journal of Statistical Software, 33, 1-22.
Belloni, A., V. Chernozhukov, C. Hansen and D. Kozbur (2016). "Inference in high dimensional panel models with an application to gun control", Journal of Business & Economic Statistics, 34, 590-605.