Fit the rare feature selection model proposed in Yan and Bien (2018):
$$min_{\beta, \gamma} 0.5 * ||y - X\beta - \beta_01_n||_2^2 +
\lambda * (\alpha * ||\gamma_{-root}||_1 + (1-\alpha) * ||\beta||_1)$$
using an alternating direction method of multipliers (ADMM) algorithm
described in Algorithm 1 of the same paper.
The regularization path is computed over a two-dimensional grid of
regularization parameters: lambda
and alpha
. Of the two,
lambda
controls the overall amount of regularization, and alpha
controls the tradeoff between sparsity and fusion of \(\beta\) (larger alpha
induces more fusion in \(\beta\)).
rarefit(y, X, A = NULL, Q = NULL, hc, intercept = T, lambda = NULL,
alpha = NULL, nlam = 50, lam.min.ratio = 1e-04, nalpha = 10,
rho = 0.01, eps1 = 1e-06, eps2 = 1e-05, maxite = 1e+06)
Length-nobs
response variable.
nobs
-by-nvars
input matrix:
each row is an observation vector and each column stores a count covariate.
nvars
-by-nnodes
binary matrix encoding ancestor-descendant relationships
between leaves and tree nodes, where nnodes
is the total number of tree nodes.
A[i,j]
is 1 if the i
th leaf is a descendant of the j
th
node in the tree, and 0 otherwise. A
should be in sparse matrix format
(inherit from class sparseMatrix
as in package Matrix
).
When A
is NULL
, the function will learn A
from hc
.
(nvars+nnodes)
-by-nnodes
matrix with columns forming an orthonormal
basis for the null space of \([I_nvars:-A]\). When Q
is NULL
, the function will learn
Q
using the singular value decomposition.
An hclust
tree of nvars
leaves where each leaf corresponds
to a covariate. If the tree is not an hclust
object, user needs to provide the matrix A
instead.
Whether intercept be fitted (default = TRUE) or set to zero (FALSE).
A user-supplied lambda
sequence. Typical usage is to
have the program compute its own lambda
sequence based on
nlam
and lam.min.ratio
.
A user-supplied alpha
sequence. If letting the program
compute its own alpha
sequence, a length-nalpha
sequence of
equally-spaced alpha
values between 0 and 1 will be used. In practice,
user may want to provide a more fine alpha
sequence to tune
the model to its best performance (e.g., alpha = c(1-exp(seq(0, log(1e-2), len = nalpha - 1)), 1)
).
Number of lambda
values (default = 50).
Smallest value for lambda
, as a fraction of
lambda.max
(i.e., the smallest value for which all coefficients are
zero). The default value is 1e-4
.
Number of alpha
values (default = 10).
Penalty parameter for the quadratic penalty in the ADMM algorithm.
The default value is 1e-2
.
Convergence threshold in terms of the absolute tolerance level
for the ADMMM algorithm. The default value is 1e-6
.
Convergence threshold in terms of the relative tolerance level
for the ADMM algorithm. The default value is 1e-5
.
Maximum number of passes over the data for every pair of
(lambda
, alpha
). The default value is 1e6
.
Returns regression coefficients for beta
and gamma
and
intercept beta0
. We use a matrix-nested-within-list structure to store the coefficients: each list
item corresponds to an alpha
value; matrix (or vector) in that list item stores
coefficients at various lambda
values by columns (or entries).
Length-nalpha
list with each item storing
intercept across various lambda
in a vector: beta0[[j]][i]
is intercept fitted at (lambda[i]
, alpha[j]
).
If intercept = FALSE
, beta0
is NULL
.
Length-nalpha
list with each item storing
beta
coefficient at various lambda
in columns of a nvars
-by-nlam
matrix:
beta[[j]][, i]
is beta
coeffcient fitted at (lambda[i]
, alpha[j]
).
Length-nalpha
list with each item storing
gamma
coefficient at various lambda
in columns of a nnodes
-by-nlam
matrix:
gamma[[j]][, i]
is gamma
coeffcient vector fitted at (lambda[i]
, alpha[j]
).
If alpha[j] = 0
, the problem becomes the lasso on beta
and is solved
with glmnet
on beta
, in which case gamma[[j]] = NA
.
Sequence of lambda
values used in model fit.
Sequence of alpha
values used in model fit.
Binary matrix encoding ancestor-descendant relationship between leaves and nodes in the tree.
Matrix with columns forming an orthonormal basis for the null space of \([I_nvars:-A]\).
Whether an intercept is included in model fit.
The function splits model fitting path by alpha
. At each alpha
value,
the model is fit on the entire sequence of lambda
with warm start. We recommend
including an intercept (by setting intercept=T
) unless the input data have been
centered.
Yan, X. and Bien, J. (2018) Rare Feature Selection in High Dimensions, https://arxiv.org/abs/1803.06675.
# NOT RUN {
# See vignette for more details.
set.seed(100)
ts <- sample(1:length(data.rating), 400) # Train set indices
# Fit the model on train set
ourfit <- rarefit(y = data.rating[ts], X = data.dtm[ts, ], hc = data.hc, lam.min.ratio = 1e-6,
nlam = 20, nalpha = 10, rho = 0.01, eps1 = 1e-5, eps2 = 1e-5, maxite = 1e4)
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab