Compute odd-ratios between each covariate of x
and y
then derived
adaptive weights to incorporate in an adaptive lasso.
BIC or cross-validation could either be used for the adaptive lasso for variable selection.
Two options for implementing cross-validation for the adaptive lasso are possible through the type_cv
parameter (see bellow).
Can deal with very large sparse data matrices.
Intended for binary reponse only (option family = "binomial"
is forced).
The cross-validation criterion used is deviance.
Depends on the glmnet
and relax.glmnet
function from the package
glmnet
.
adapt_univ(
x,
y,
gamma = 1,
criterion = "bic",
maxp = 50,
path = TRUE,
nfolds = 5,
foldid = NULL,
type_cv = "proper",
betaPos = TRUE,
...
)
Input matrix, of dimension nobs x nvars. Each row is an observation
vector. Can be in sparse matrix format (inherit from class
"sparseMatrix"
as in package Matrix
).
Binary response variable, numeric.
Tunning parameter to defined the penalty weights. See details below. Default is set to 1.
Character, indicates which criterion is used with the adaptive lasso for variable selection. Could be either "bic" or "cv". Default is "bic"
Used only if criterion
= "bic", ignored if criterion
= "cv".
A limit on how many relaxed coefficients are allowed. Default is 50, in glmnet
option default is 'n-3', where 'n' is the sample size.
Used only if criterion
= "bic", ignored if criterion
= "cv".
Since glmnet
does not do stepsize optimization, the Newton
algorithm can get stuck and not converge, especially with relaxed fits.
With path=TRUE
, each relaxed fit on a particular set of variables
is computed pathwise using the original sequence of lambda values
(with a zero attached to the end). Default is path=TRUE
.
Used only if criterion
= "cv", ignored if criterion
= "bic".
Number of folds - default is 5. Although nfolds
can be
as large as the sample size (leave-one-out CV), it is not recommended for
large datasets. Smallest value allowable is nfolds=3
.
Used only if criterion
= "cv", ignored if criterion
= "bic".
An optional vector of values between 1 and nfolds
identifying what fold each observation is in. If supplied, nfolds
can
be missing.
Used only if criterion
= "cv", ignored if criterion
= "bic".
Character, indicates which implementation of cross-validation is performed for the adaptive lasso: a "naive" one,
where adaptive weights obtained on the full data are used, and a "proper" one, where adaptive weights are calculated for each training sets.
Could be either "naive" or "proper".
Default is "proper".
Should the covariates selected by the procedure be
positively associated with the outcome ? Default is TRUE
.
Other arguments that can be passed to glmnet
from package
glmnet
other than family
, maxp
, standardize
, intercept
An object with S3 class "adaptive"
.
Numeric vector of penalty weights derived from odds-ratios. Length equal to nvars.
Character, same as input. Could be either "bic" or "cv".
Numeric vector of regression coefficients in the adaptive lasso.
If criterion
= "cv" the regression coefficients are PENALIZED, if
criterion
= "bic" the regression coefficients are UNPENALIZED.
Length equal to nvars. Could be NA if adaptive weights are all equal to infinity.
Character vector, names of variable(s) selected
with this adaptive approach.
If betaPos = TRUE
, this set is the covariates with a positive regression
coefficient in beta
.
Else this set is the covariates with a non null regression coefficient in beta
.
If criterion
= "bic", covariates are ordering according to magnitude of their regression
coefficients absolute value in the adaptive lasso.
If criterion
= "bic", covariates are ordering according to the p-values (two-sided if betaPos = FALSE
,
one-sided if betaPos = TRUE
) in the classical multiple logistic regression
model that minimzes the BIC in the adaptive lasso.
The adaptive weight for a given covariate i is defined by $$w_i = 1/|\beta^{univ}_i|^\gamma$$ where \(\beta^{univ}_i = log(OR_i)\), with \(OR_i\) is the odd-ratio associated to covariate \(i\) with the outcome.
# NOT RUN {
set.seed(15)
drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20)
colnames(drugs) <- paste0("drugs",1:ncol(drugs))
ae <- rbinom(100, 1, 0.3)
au <- adapt_univ(x = drugs, y = ae, criterion ="cv", nfolds = 3)
# }
Run the code above in your browser using DataLab