Compute the CISL procedure (see cisl
for more details) to determine
adaptive penalty weights, then run an adaptive lasso with this penalty weighting.
BIC is used for the adaptive lasso for variable selection.
Can deal with very large sparse data matrices.
Intended for binary reponse only (option family = "binomial"
is forced).
Depends on the glmnet
function from the package glmnet
.
adapt_cisl(
x,
y,
cisl_nB = 100,
cisl_dfmax = 50,
cisl_nlambda = 250,
cisl_ncore = 1,
maxp = 50,
path = TRUE,
betaPos = TRUE,
...
)
Input matrix, of dimension nobs x nvars. Each row is an observation
vector. Can be in sparse matrix format (inherit from class
"sparseMatrix"
as in package Matrix
).
Binary response variable, numeric.
nB
option in cisl
function. Default is 100.
dfmax
option in cisl
function. Default is 50.
nlambda
option in cisl
function. Default is 250.
ncore
option in cisl
function. Default is 1.
A limit on how many relaxed coefficients are allowed.
Default is 50, in glmnet
option default is 'n-3', where 'n' is the sample size.
Since glmnet
does not do stepsize optimization, the Newton
algorithm can get stuck and not converge, especially with relaxed fits. With path=TRUE
,
each relaxed fit on a particular set of variables is computed pathwise using the original sequence
of lambda values (with a zero attached to the end). Default is path=TRUE
.
Should the covariates selected by the procedure be
positively associated with the outcome ? Default is TRUE
.
Other arguments that can be passed to glmnet
from package glmnet
other than penalty.factor
,
family
, maxp
and path
.
An object with S3 class "adaptive"
.
Numeric vector of penalty weights derived from CISL. Length equal to nvars.
Character, indicates which criterion is used with the
adaptive lasso for variable selection. For adapt_cisl
function,
criterion
is "bic".
Numeric vector of regression coefficients in the adaptive lasso.
If criterion
= "cv" the regression coefficients are PENALIZED, if
criterion
= "bic" the regression coefficients are UNPENALIZED.
Length equal to nvars. Could be NA if adaptive weights are all equal to infinity.
Character vector, names of variable(s) selected
with this adaptive approach.
If betaPos = TRUE
, this set is the covariates with a positive regression
coefficient in beta
.
Else this set is the covariates with a non null regression coefficient in beta
.
Covariates are ordering according to the p-values (two-sided if betaPos = FALSE
,
one-sided if betaPos = TRUE
) in the classical multiple logistic regression
model that minimzes the BIC in the adaptive lasso.
The CISL procedureis first implemented with its default value except for
dfmax
and nlambda
through parameters cisl_dfmax
and
cisl_nlambda
.
In addition, the betaPos
parameter is set to FALSE in cisl
.
For each covariate \(i\), cisl_nB
values of the CISL quantity \(\tau_i\)
are estimated.
The adaptive weight for a given covariate \(i\) is defined by
$$w_i = 1- 1/cisl_nB \sum_{b=1, .., cisl_nB} 1 [ \tau^b_i >0 ]$$
If \(\tau_i\) is the null vector, the associated adaptve weights in infinty.
If \(\tau_i\) is always positive, rather than "forcing" the variable into
the model, we set the corresponding adaptive weight to 1/cisl_nB
.
# NOT RUN {
set.seed(15)
drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20)
colnames(drugs) <- paste0("drugs",1:ncol(drugs))
ae <- rbinom(100, 1, 0.3)
acisl <- adapt_cisl(x = drugs, y = ae, cisl_nB = 50, maxp=10)
# }
Run the code above in your browser using DataLab