cv.glmnet
Fit a first cross-validation on lasso regression and return selected covariates.
Can deal with very large sparse data matrices.
Intended for binary reponse only (option family = "binomial"
is forced).
Depends on the cv.glmnet
function from the package glmnet
.
lasso_cv(x, y, nfolds = 5, foldid = NULL, betaPos = TRUE, ...)
An object with S3 class "log.lasso"
.
Numeric vector of regression coefficients in the lasso.
In lasso_cv
function, the regression coefficients are PENALIZED.
Length equal to nvars.
Character vector, names of variable(s) selected with the
lasso-cv approach.
If betaPos = TRUE
, this set is the covariates with a positive regression
coefficient in beta
.
Else this set is the covariates with a non null regression coefficient in
beta
.
Covariates are ordering according to magnitude of their regression
coefficients absolute value.
Input matrix, of dimension nobs x nvars. Each row is an observation
vector. Can be in sparse matrix format (inherit from class
"sparseMatrix"
as in package Matrix
).
Binary response variable, numeric.
Number of folds - default is 5. Although nfolds
can be
as large as the sample size (leave-one-out CV), it is not recommended for
large datasets. Smallest value allowable is nfolds=3
.
An optional vector of values between 1 and nfolds
identifying what fold each observation is in.
If supplied, nfolds
can be missing.
Should the covariates selected by the procedure be positively
associated with the outcome ? Default is TRUE
.
Other arguments that can be passed to cv.glmnet
from package glmnet
other than nfolds
, foldid
,
and family
.
Emeline Courtois
Maintainer: Emeline Courtois
emeline.courtois@inserm.fr
set.seed(15)
drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20)
colnames(drugs) <- paste0("drugs",1:ncol(drugs))
ae <- rbinom(100, 1, 0.3)
lcv <- lasso_cv(x = drugs, y = ae, nfolds = 3)
Run the code above in your browser using DataLab