lasso_perm: fit a lasso regression and use standard permutation of the outcome for variable selection

Description

Performed K lasso logistic regression with K different permuted version of the outcome. For earch of the lasso regression, the \(\lambda_max\)(i.e. the smaller \(\lambda\) such as all penalized regression coefficients are shrunk to zero) is obtained. The median value of these K \(\lambda_max\) is used to for variable selection in the lasso regression with the non-permuted outcome. Depends on the glmnet function from the package glmnet.

Usage

lasso_perm(x, y, K = 20, keep = NULL, betaPos = TRUE, ncore = 1, ...)

Value

An object with S3 class "log.lasso".

beta: Numeric vector of regression coefficients in the lasso In lasso_perm function, the regression coefficients are PENALIZED. Length equal to nvars.
selected_variables: Character vector, names of variable(s) selected with the lasso-perm approach. If betaPos = TRUE, this set is the covariates with a positive regression coefficient in beta. Else this set is the covariates with a non null regression coefficient in beta. Covariates are ordering according to magnitude of their regression coefficients absolute value.

Arguments

x: Input matrix, of dimension nobs x nvars. Each row is an observation vector. Can be in sparse matrix format (inherit from class "sparseMatrix" as in package Matrix).
y: Binary response variable, numeric.
K: Number of permutations of y. Default is 20.
keep: Do some variables of x have to be permuted in the same way as y? Default is NULL, means no. If yes, must be a vector of covariates indices. TEST OPTION
betaPos: Should the covariates selected by the procedure be positively associated with the outcome ? Default is TRUE.
ncore: The number of calcul units used for parallel computing. Default is 1, no parallelization is implemented.
...: Other arguments that can be passed to glmnet from package glmnet other than family.

Author

Emeline Courtois
Maintainer: Emeline Courtois emeline.courtois@inserm.fr

Details

The selected \(\lambda\) with this approach is defined as the closest \(\lambda\) from the median value of the K \(\lambda_max\) obtained with permutation of the outcome.

References

Sabourin, J. A., Valdar, W., & Nobel, A. B. (2015). "A permutation approach for selecting the penalty parameter in penalized model selection". Biometrics. 71(4), 1185–1194, tools:::Rd_expr_doi("10.1111/biom.12359")

Examples

Run this code


set.seed(15)
drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20)
colnames(drugs) <- paste0("drugs",1:ncol(drugs))
ae <- rbinom(100, 1, 0.3)
lp <- lasso_perm(x = drugs, y = ae, K = 10)

Run the code above in your browser using DataLab