Learn R Programming

SparseLearner (version 1.0-2)

BRLasso: Bootstrap ranking LASSO model.

Description

This function performs a LASSO logistic regression model using a bootstrap ranking procedure, namely the BRLasso logistic regression model, produces an optimal set of predictors and returns the robust estimations of coefficients of the selected predictors.

Usage

BRLasso(x, y, B = 5, Boots = 100, kfold = 10, seed = 0123)

Arguments

x
predictor matrix.
y
response variable, a factor object with values of 0 and 1.
B
the number of external loop for intersection operation, with the default value 5.
Boots
the number of internal loop for bootstrap sampling, with the default value 100.
kfold
the number of folds of cross validation - default is 10. Although kfold can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is kfold=3.
seed
the seed for random sampling, with the default value 0123.

Value

B
the number of external loop for intersection operation.
Boots
the number of internal loop for bootstrap sampling.
var.selected
significant variables that are selected by the BRLasso model.
var.coef
coefficients of the selected significant variables.

Details

This function runs the LASSO logistic regression model using a bootstrap ranking procedure. The bootstrap ranking procedure generates a LASSO estimates matrix representing variable ranking according to importance, and runs the external intersection operation to extract a panel of informative variables. Like the Bolasso, the bootstrap ranking procedure performs the intersection operation to extract relevant variables from sufficiently many different data sets using the sample bootstrapping method. However, instead of directly intersecting several sets of non-zero LASSO estimates, the model generates a LASSO estimate matrix representing variable ranking according to their importance, and then intersects to obtain a robust result. During each internal loop, bootstrap samples are generated by randomly selecting n individuals with replacement from a given data set of n individuals. For each such sample, LASSO regression coefficients are estimated for all variables, and the average estimation across internal bootstraps is calculated as the measurement of variable importance. Specific values of B and Boots parameters should be supplied, however B=5 and Boots=100 is proposed by default. Users can reduce the running time by using 3-fold CV, but the proposed 10-fold CV is assumed by default.

References

[1] Guo, P., Zeng, F., Hu, X., Zhang, D., Zhu, S., Deng, Y., & Hao, Y. (2015). Improved Variable Selection Algorithm Using a LASSO-Type Penalty, with an Application to Assessing Hepatitis B Infection Relevant Factors in Community Residents. PLoS One, 27;10(7):e0134151.

[2] Friedman, J., Hastie, T. and Tibshirani, R. (2008). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22.

Examples

Run this code
library(datasets)
head(iris)
X <- as.matrix(subset(iris, iris$Species!="setosa")[, -5])
Y <- as.factor(ifelse(subset(iris, iris$Species!="setosa")[, 5]=='versicolor', 0, 1))
# Fit a bootstrap ranking LASSO (BRLasso) logistic regression model.
# The parameters of B and Boots in the following example are set as small values to
# reduce the running time, however the default values are proposed.
BRLasso.fit <- BRLasso(x=X, y=Y, B=2, Boots=5, seed=0123)
# Significant variables that are selected by the BRLasso model.
BRLasso.fit$var.selected
# Coefficients of the selected variables.
BRLasso.fit$var.coef

Run the code above in your browser using DataLab