adapt_cv: fit an adaptive lasso with adaptive weights derived from lasso-cv

Description

Fit a first lasso regression with cross-validation to determine adaptive weights, then run an adaptive lasso with this penalty weighting. Cross-validation is used for the adaptive lasso for variable selection. Can deal with very large sparse data matrices. Intended for binary reponse only (option family = "binomial" is forced). Depends on the cv.glmnet function from the package glmnet.

Usage

adapt_cv(x, y, gamma = 1, nfolds = 5, foldid = NULL, betaPos = TRUE, ...)

Arguments

Input matrix, of dimension nobs x nvars. Each row is an observation vector. Can be in sparse matrix format (inherit from class "sparseMatrix" as in package Matrix).

Binary response variable, numeric.

gamma

Tunning parameter to defined the penalty weights. See details below. Default is set to 1.

nfolds

Number of folds - default is 5. Although nfolds can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is nfolds=3.

foldid

An optional vector of values between 1 and nfolds identifying what fold each observation is in. If supplied, nfolds can be missing.

betaPos

Should the covariates selected by the procedure be positively associated with the outcome ? Default is TRUE.

…

Other arguments that can be passed to cv.glmnet from package glmnet other than nfolds, foldid, penalty.factor and family.

Value

An object with S3 class "adaptive".

aws

Numeric vector of penalty weights derived from cross-validation. Length equal to nvars.

criterion

Character, indicates which criterion is used with the adaptive lasso for variable selection. For adapt_cv function, criterion is "cv".

beta

Numeric vector of regression coefficients in the adaptive lasso. If criterion = "cv" the regression coefficients are PENALIZED, if criterion = "bic" the regression coefficients are UNPENALIZED. Length equal to nvars. Could be NA if adaptive weights are all equal to infinity.

selected_variables

Character vector, names of variable(s) selected with this adaptive approach. If betaPos = TRUE, this set is the covariates with a positive regression coefficient in beta. Else this set is the covariates with a non null regression coefficient in beta. Covariates are ordering according to magnitude of their regression coefficients absolute value in the adaptive lasso.

Details

The adaptive weight for a given covariate i is defined by $$w_i = 1/|\beta^CV_i|^\gamma$$ where $\beta^CV_i$ is the PENALIZED regression coefficient associated to covariate $i$ obtained with cross-validation.

Examples

Run this code

# NOT RUN {
set.seed(15)
drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20)
colnames(drugs) <- paste0("drugs",1:ncol(drugs))
ae <- rbinom(100, 1, 0.3)
acv <- adapt_cv(x = drugs, y = ae, nfolds = 5)


# }

Run the code above in your browser using DataLab