sparsegl object.Performs k-fold cross-validation for sparsegl().
This function is largely similar glmnet::cv.glmnet().
cv.sparsegl(
x,
y,
group = NULL,
family = c("gaussian", "binomial"),
lambda = NULL,
pred.loss = c("default", "mse", "deviance", "mae", "misclass"),
nfolds = 10,
foldid = NULL,
weights = NULL,
offset = NULL,
...
)An object of class cv.sparsegl() is returned, which is a
list with the components describing the cross-validation error.
The values of lambda used in the fits.
The mean cross-validated error - a vector of
length length(lambda).
Estimate of standard error of cvm.
Upper curve = cvm + cvsd.
Lower curve = cvm - cvsd.
A text string indicating type of measure (for plotting purposes).
The number of non-zero coefficients for each lambda
The number of active groups for each lambda
A fitted sparsegl() object for the full data.
The optimal value of lambda that gives
minimum cross validation error cvm.
The largest value of lambda such that error
is within 1 standard error of the minimum.
The function call.
Double. A matrix of predictors, of dimension
\(n \times p\); each row
is a vector of measurements and each column is a feature. Objects of class
Matrix::sparseMatrix are supported.
Double/Integer/Factor. The response variable.
Quantitative for family="gaussian" and for other exponential families.
If family="binomial" should be either a factor with two levels or
a vector of integers taking 2 unique values. For a factor, the last level
in alphabetical order is the target class.
Integer. A vector of consecutive integers describing the grouping of the coefficients (see example below).
Character or function. Specifies the generalized linear model to use. Valid options are:
"gaussian" - least squares loss (regression, the default),
"binomial" - logistic loss (classification)
For any other type, a valid stats::family() object may be passed. Note
that these will generally be much slower to estimate than the built-in
options passed as strings. So for example, family = "gaussian" and
family = gaussian() will produce the same results, but the first
will be much faster.
A user supplied lambda sequence. The default, NULL
results in an automatic computation based on nlambda, the smallest value
of lambda that would give the null model (all coefficient estimates equal
to zero), and lambda.factor. Supplying a value of lambda overrides
this behaviour. It is likely better to supply a
decreasing sequence of lambda values than a single (small) value. If
supplied, the user-defined lambda sequence is automatically sorted in
decreasing order.
Loss to use for cross-validation error. Valid options are:
"default" the same as deviance (mse for regression and deviance otherwise)
"mse" mean square error
"deviance" the default (mse for Gaussian regression, and negative
log-likelihood otherwise)
"mae" mean absolute error, can apply to any family
"misclass" for classification only, misclassification error.
Number of folds - default is 10. Although nfolds can be
as large as the sample size (leave-one-out CV), it is not recommended for
large datasets. Smallest value allowable is nfolds = 3.
An optional vector of values between 1 and nfolds
identifying which fold each observation is in. If supplied, nfolds can
be missing.
Double vector. Optional observation weights. These can
only be used with a stats::family() object.
Double vector. Optional offset (constant predictor without a
corresponding coefficient). These can only be used with a
stats::family() object.
Additional arguments to sparsegl().
The function runs sparsegl() nfolds + 1 times; the first to
get the lambda sequence, and then the remainder to compute the fit
with each of the folds omitted. The average error and standard error
over the folds are computed.
Liang, X., Cohen, A., Sólon Heinsfeld, A., Pestilli, F., and
McDonald, D.J. 2024.
sparsegl: An R Package for Estimating Sparse Group Lasso.
Journal of Statistical Software, Vol. 110(6): 1–23.
tools:::Rd_expr_doi("10.18637/jss.v110.i06").
n <- 100
p <- 20
X <- matrix(rnorm(n * p), nrow = n)
eps <- rnorm(n)
beta_star <- c(rep(5, 5), c(5, -5, 2, 0, 0), rep(-5, 5), rep(0, (p - 15)))
y <- X %*% beta_star + eps
groups <- rep(1:(p / 5), each = 5)
cv_fit <- cv.sparsegl(X, y, groups)
Run the code above in your browser using DataLab