Fit the rare feature selection model proposed in Yan and Bien (2018):
$$min_{\beta, \gamma} 0.5 * ||y - X\beta - \beta_01_n||_2^2 +
\lambda * (\alpha * ||\gamma_{-root}||_1 + (1-\alpha) * ||\beta||_1)$$
using an alternating direction method of multipliers (ADMM) algorithm
described in Algorithm 1 of the same paper.
The regularization path is computed over a two-dimensional grid of
regularization parameters: lambda and alpha. Of the two,
lambda controls the overall amount of regularization, and alpha
controls the tradeoff between sparsity and fusion of \(\beta\) (larger alpha
induces more fusion in \(\beta\)).
rarefit(y, X, A = NULL, Q = NULL, hc, intercept = T, lambda = NULL,
alpha = NULL, nlam = 50, lam.min.ratio = 1e-04, nalpha = 10,
rho = 0.01, eps1 = 1e-06, eps2 = 1e-05, maxite = 1e+06)Returns regression coefficients for beta and gamma and
intercept beta0. We use a matrix-nested-within-list structure to store the coefficients: each list
item corresponds to an alpha value; matrix (or vector) in that list item stores
coefficients at various lambda values by columns (or entries).
Length-nalpha list with each item storing
intercept across various lambda in a vector: beta0[[j]][i]
is intercept fitted at (lambda[i], alpha[j]).
If intercept = FALSE, beta0 is NULL.
Length-nalpha list with each item storing
beta coefficient at various lambda in columns of a nvars-by-nlam matrix:
beta[[j]][, i] is beta coeffcient fitted at (lambda[i], alpha[j]).
Length-nalpha list with each item storing
gamma coefficient at various lambda in columns of a nnodes-by-nlam matrix:
gamma[[j]][, i] is gamma coeffcient vector fitted at (lambda[i], alpha[j]).
If alpha[j] = 0, the problem becomes the lasso on beta and is solved
with glmnet on beta, in which case gamma[[j]] = NA.
Sequence of lambda values used in model fit.
Sequence of alpha values used in model fit.
Binary matrix encoding ancestor-descendant relationship between leaves and nodes in the tree.
Matrix with columns forming an orthonormal basis for the null space of \([I_nvars:-A]\).
Whether an intercept is included in model fit.
Length-nobs response variable.
nobs-by-nvars input matrix:
each row is an observation vector and each column stores a count covariate.
nvars-by-nnodes binary matrix encoding ancestor-descendant relationships
between leaves and tree nodes, where nnodes is the total number of tree nodes.
A[i,j] is 1 if the ith leaf is a descendant of the jth
node in the tree, and 0 otherwise. A should be in sparse matrix format
(inherit from class sparseMatrix as in package Matrix).
When A is NULL, the function will learn A from hc.
(nvars+nnodes)-by-nnodes matrix with columns forming an orthonormal
basis for the null space of \([I_nvars:-A]\). When Q is NULL, the function will learn
Q using the singular value decomposition.
An hclust tree of nvars leaves where each leaf corresponds
to a covariate. If the tree is not an hclust object, user needs to provide the matrix A instead.
Whether intercept be fitted (default = TRUE) or set to zero (FALSE).
A user-supplied lambda sequence. Typical usage is to
have the program compute its own lambda sequence based on
nlam and lam.min.ratio.
A user-supplied alpha sequence. If letting the program
compute its own alpha sequence, a length-nalpha sequence of
equally-spaced alpha values between 0 and 1 will be used. In practice,
user may want to provide a more fine alpha sequence to tune
the model to its best performance (e.g., alpha = c(1-exp(seq(0, log(1e-2), len = nalpha - 1)), 1)).
Number of lambda values (default = 50).
Smallest value for lambda, as a fraction of
lambda.max (i.e., the smallest value for which all coefficients are
zero). The default value is 1e-4.
Number of alpha values (default = 10).
Penalty parameter for the quadratic penalty in the ADMM algorithm.
The default value is 1e-2.
Convergence threshold in terms of the absolute tolerance level
for the ADMMM algorithm. The default value is 1e-6.
Convergence threshold in terms of the relative tolerance level
for the ADMM algorithm. The default value is 1e-5.
Maximum number of passes over the data for every pair of
(lambda, alpha). The default value is 1e6.
The function splits model fitting path by alpha. At each alpha value,
the model is fit on the entire sequence of lambda with warm start. We recommend
including an intercept (by setting intercept=T) unless the input data have been
centered.
Yan, X. and Bien, J. (2018) Rare Feature Selection in High Dimensions, https://arxiv.org/abs/1803.06675.
rarefit.cv, rarefit.predict
if (FALSE) {
# See vignette for more details.
set.seed(100)
ts <- sample(1:length(data.rating), 400) # Train set indices
# Fit the model on train set
ourfit <- rarefit(y = data.rating[ts], X = data.dtm[ts, ], hc = data.hc, lam.min.ratio = 1e-6,
nlam = 20, nalpha = 10, rho = 0.01, eps1 = 1e-5, eps2 = 1e-5, maxite = 1e4)
}
Run the code above in your browser using DataLab