Fit a first cross-validation on lasso regression and return selected covariates.
Can deal with very large sparse data matrices.
Intended for binary reponse only (option family = "binomial" is forced).
Depends on the cv.glmnet function from the package glmnet.
Numeric vector of regression coefficients in the lasso.
In lasso_cv function, the regression coefficients are PENALIZED.
Length equal to nvars.
selected_variables
Character vector, names of variable(s) selected with the
lasso-cv approach.
If betaPos = TRUE, this set is the covariates with a positive regression
coefficient in beta.
Else this set is the covariates with a non null regression coefficient in
beta.
Covariates are ordering according to magnitude of their regression
coefficients absolute value.
Arguments
x
Input matrix, of dimension nobs x nvars. Each row is an observation
vector. Can be in sparse matrix format (inherit from class
"sparseMatrix" as in package Matrix).
y
Binary response variable, numeric.
nfolds
Number of folds - default is 5. Although nfolds can be
as large as the sample size (leave-one-out CV), it is not recommended for
large datasets. Smallest value allowable is nfolds=3.
foldid
An optional vector of values between 1 and nfolds
identifying what fold each observation is in.
If supplied, nfolds can be missing.
betaPos
Should the covariates selected by the procedure be positively
associated with the outcome ? Default is TRUE.
...
Other arguments that can be passed to cv.glmnet
from package glmnet other than nfolds, foldid,
and family.