Fit a model using the principal components lasso for an entire regularization
path indexed by the parameter lambda
. Fits linear and logistic regression
models.
pcLasso(x, y, w = rep(1, length(y)), family = c("gaussian",
"binomial"), ratio = NULL, theta = NULL, groups = vector("list",
1), lambda.min.ratio = ifelse(nrow(x) < ncol(x), 0.01, 1e-04),
nlam = 100, lambda = NULL, standardize = F, SVD_info = NULL,
nv = NULL, propack = T, thr = 1e-04, maxit = 1e+05,
verbose = FALSE)
Input matrix, of dimension nobs
x nvars
; each row is
an observation vector.
Response variable. Quantitative for family = "gaussian"
. For
family="binomial"
, should be a numeric vector consisting of 0s and 1s.
Observation weights. Default is 1 for each observation.
Response type. Either "gaussian"
(default) for linear
regression or "binomial"
for logistic regression.
Ratio of shrinkage between the second and first principal components
in the absence of the \(\ell_1\) penalty. More convenient way to specify the
strength of the quadratic penalty. A value between 0 and 1 (only 1 included).
ratio = 1
corresponds to the lasso, 0.5-0.9 are good values to use.
Default is NULL
. Exactly one of ratio
or theta
must be
specified.
Multiplier for the quadratic penalty: a non-negative real number.
theta = 0
corresponds to the lasso, and larger theta
gives
strong shrinkage toward the top principal components. Default is NULL
.
Exactly one of ratio
or theta
must be specified.
A list describing which features belong in each group. The
length of the list should be equal to the number of groups, with
groups[[k]]
being a vector of feature/column numbers which belong to
group k. Each feature must be assigned to at least one group. Features can
belong to more than one group. By default, all the features belong to a
single group.
Smallest value for lambda
, as a fraction of the
largest lambda
value. The default depends on the sample size nobs
relative to the number of variables nvars
. If nobs >= nvars
,
the default is 0.0001
, close to zero. If nobs < nvars
, the default is
0.01
. This is only used when the user does not specify a lambda
sequence.
Number of lambda
values; default is 100.
A user supplied lambda
sequence. Typical usage is to
have the program compute its own lambda
sequence based on nlam
and lambda.min.ratio
; supplying a value of lambda overrides this.
If TRUE
, the columns of the feature matrices are
standardized before the algorithm is run. Default is FALSE
.
A list containing SVD information. Usually this should not
be specified by the user: the function will compute it on its own by default.
SVD_info
is a list with three elements:
aa
: A row-wise concatenation of the k quadratic penalty matrices.
d
: A list where the kth element is the vector of squared singular
values of the input matrix corresponding to group k.
dd
: A list where the kth element is the vector of
\(d_{k1}^2 - d_{kj}^2\) for the input matrix corresponding to group k.
Since the initial SVD of x
can be the largest part of the overall
computation, we allow the user to compute it once and then re-use. See
example below.
Number of singular vectors to use in the singular value decompositions. If not specified, the full SVD is used.
If TRUE
(default), uses propack.svd
from the
svd
package to perform the singular value decompositions. If not, uses
svd
from base R.
Convergence threhold for the coordinate descent algorithm. Default
is 1e-4
.
Maximum number of passes over the data for all lambda values;
default is 1e5
.
Print out progess along the way? Default is FALSE
.
An object of class "pcLasso"
.
If the groups overlap, a p_1+..._p_K x length(lambda)
matrix of coefficients in the expanded feature space. If not, a nvars
x length(lambda)
matrix of coefficients in the original feature space.
If the groups overlap, a nvars
x
length(lambda)
matrix of coefficients in the original feature space.
NULL
if the groups do not overlap.
Intercept sequence of length length(lambda)
.
The actual sequence of lambda
values used.
If the groups overlap, the number of non-zero coefficients in the
expanded feature space for each value of lambda
. Otherwise, the number
of non-zero coefficients in the original feature space.
If the groups are overlapping, this is the number of
non-zero coefficients in the original feature space of the model for each
lambda
. If groups are not overlapping, it is NULL
.
Error flag for warnings and errors (largely for internal debugging).
Value of theta
used in model fitting.
If the groups
parameter was passed to the
function call, this is a copy of that parameter. Otherwise, it is a list of
length 1, with the first element being a vector of integers from 1 to
nvars
.
If the groups are not overlapping, this has the same
value as groups
. If the groups are overlapping, then
groups[[k]]
is the vector from p_1 + ... + p_{k-1} + 1
to
p_1 + ... p_k
, where p_k
is the number of features in group k.
A list containing SVD information. See param SVD_info
for more information.
If groups overlap, column means of the expanded x
matrix. Otherwise, column means of the original x
matrix.
Column means of the original x
matrix.
If family = "gaussian"
, mean of the responses y
.
Otherwise, it is NA
.
A logical flag indicating if the feature groups were overlapping or not.
Actual number of passes over the data for all lambda values.
Response type.
The call that produced this object.
The objective function for "gaussian"
is
$$1/2 RSS/nobs + \lambda*||\beta||_1 + \theta/2 \sum quadratic
penalty for group k,$$
where the sum is over the feature groups \(1, ..., K\). The objective function
for "binomial"
is
$$-loglik/nobs + \lambda*||\beta||_1 + \theta/2 \sum quadratic
penalty for group k.$$
pcLasso
can handle overlapping groups. In this case, the original
x
matrix is expanded to a nobs x p_1+...+p_K
matrix (where
p_k
is the number of features in group k) such that columns
p_1+...+p_{k-1}+1
to p_1+...+p_k
represent the feature matrix for
group k. pcLasso
returns the model coefficients for both the expanded
feature space and the original feature space.
One needs to specify the strength of the quadratic penalty either by
specifying ratio
, which is the ratio of shrinkage between the second
and first principal components in the absence of the \(\ell_1\) penalty,
or by specifying the multiplier theta
. ratio
is unitless and is
more convenient.
pcLasso
always mean centers the columns of the x
matrix. If
standardize=TRUE
, pcLasso
will also scale the columns to have
standard deviation 1. In all cases, the beta
coefficients returned are
for the original x
values (i.e. uncentered and unscaled).
# NOT RUN {
set.seed(1)
x <- matrix(rnorm(100 * 20), 100, 20)
y <- rnorm(100)
# all features in one group by default
fit1 <- pcLasso(x, y, ratio = 0.8)
# print(fit1) # Not run
# features in groups
groups <- vector("list", 4)
for (k in 1:4) {
groups[[k]] <- 5 * (k-1) + 1:5
}
fit2 <- pcLasso(x, y, groups = groups, ratio = 0.8)
# groups can be overlapping
groups[[1]] <- 1:8
fit3 <- pcLasso(x, y, groups = groups, ratio = 0.8)
# specify ratio or theta, but not both
fit4 <- pcLasso(x, y, groups = groups, theta = 10)
# family = "binomial"
y2 <- sample(0:1, 100, replace = TRUE)
fit5 <- pcLasso(x, y2, ratio = 0.8, family = "binomial")
# example where SVD is computed once, then re-used
fit1 <- pcLasso(x, y, ratio = 0.8)
fit2 <- pcLasso(x, y, ratio = 0.8, SVD_info = fit1$SVD_info)
# }
Run the code above in your browser using DataCamp Workspace