cvplogistic: Majorization minimization by coordinate descent for concave penalized logistic regression

Description

Compute solution surface for a high-dimensional logistic regression model with concave penalty using MMCD, adaptive rescaling or LLA algorithms

Usage

cvplogistic(y, x, penalty = "mcp", approach = "mmcd", path = "kappa",
nkappa = 10, maxkappa = 0.249, nlambda = 100, minlambda = 0.01,
epsilon = 1e-3, maxit = 1e+3)

Arguments

Value

A list with five elements is returned.lambdaA vector of length nkappa*nlambda for the penalty parameter lambda, ranging from the largest to the smallest with block size nkappa.kappaA vector of length nkappa*nlambda for the regularization parameter kappa, ranging from 0 to maxkappa within a block with size nkappa.dfA vector of length nkappa*nlambda indicating the degree of freedom(model size, the number of covariates) for the corresponding solution.coef.inteceptA vector of length nkappa*nlambda corresponding to the coefficient of the intercept.coef.covariatesA matrix of dimension p*(nkappa*nlambda), with p the number of variables (columns) in x.

Rdversion

2.0

Details

The package implements the majorization minimization by coordinate descent (MMCD) algorithm for computing the solution surface of a concave penalized logistic regression model in high-dimensional data. The MMCD algorithm seeks a closed form solution for each coordinate and majorizes the loss function to avoid the computation of scaling factors. The algorithm is efficient and stable for high-dimensional data with p>>n. The package provides three ways to compute solution surfaces for a concave penalized logistic model. The first one is to compute along the regularization parameter kappa. That is for a given penalty parameter lambda, using the Lasso solution (kappa=0) to initiate the computation for MCP or SCAD solutions along grid values of kappa. The second type is to compute along the penalty parameter lambda. That is for a given regularization parameter kappa, the MCP or SCAD solutions are computed along lambda. The solution surface computed along kappa tends to have a better performance in terms of model size and false discovery rate. Thus, the solution surface along kappa is recommended. The third type of solution is called hybrid algorithm. The hybrid algorithm is specifically designed for the applications aimed to identify leading causal predictors. In most cases, the hybrid algorithm achieves the same predictive performance as the other two methods. This hybrid algorithm can be viewed as a variant of the solution surface along kappa. It uses Lasso solution (kappa=0) as the initial values and apply the MMCD algorithm to the variables selected by Lasso only. In another word, Lasso is used to pre-process the variables before applying the MCP penalty. This practice greatly reduces the computation burden. However, if Lasso misses some variables, they will necessarily be removed from final model. For MCP penalty, the package also implements the adaptive rescaling and the local linear approximation by coordinate descent algorithms (LLA-CDA) algorithms. The three types of solution path is also available for the adaptive rescaling approach. For the LLA-CDA algorithm, only the solution path along kappa is implemented for the LLA-CDA algorithm.

References

Dingfeng Jiang, Jian Huang. Majorization Minimization by Coordinate Descent for Concave Penalized Generalized Linear Models. Zou, H., Li, R. (2008). One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. Ann Stat, 364: 1509-1533. Breheny, P., Huang, J. (2011). Coordinate Descent Algorithms for Nonconvex Penalized Regression, with Application to Biological Feature Selection. Ann Appl Stat, 5(1), 232-253. Jiang, D., Huang, J., Zhang, Y. (2011). The Cross-validated AUC for MCP-Logistic Regression with High-dimensional Data. Stat Methods Med Res, online first, Nov 28, 2011.

Examples

Run this code

set.seed(10000)
n=100
y=rbinom(n,1,0.4)
p=50
x=matrix(rnorm(n*p),n,p)
nkappa=5
maxkappa=0.249
nlambda=20
## MCP penalty
penalty="mcp"
approach="mmcd"
path="kappa"
out=cvplogistic(y, x, penalty, approach, path, nkappa, maxkappa, nlambda)
path="lambda"
out=cvplogistic(y, x, penalty, approach, path, nkappa, maxkappa, nlambda)
path="hybrid"
out=cvplogistic(y, x, penalty, approach, path, nkappa, maxkappa, nlambda)
approach="adaptive"
path="kappa"
out=cvplogistic(y, x, penalty, approach, path, nkappa, maxkappa, nlambda)
path="lambda"
out=cvplogistic(y, x, penalty, approach, path, nkappa, maxkappa, nlambda)
path="hybrid"
out=cvplogistic(y, x, penalty, approach, path, nkappa, maxkappa, nlambda)
## using LLA approach, path option has no effect.
approach="llacda"
maxkappa=0.99
out=cvplogistic(y, x, penalty, approach, path, nkappa, maxkappa, nlambda)
## SCAD penalty
maxkappa=0.19
penalty="scad"
path="kappa"
out=cvplogistic(y, x, penalty, approach, path, nkappa, maxkappa, nlambda)
path="lambda"
out=cvplogistic(y, x, penalty, approach, path, nkappa, maxkappa, nlambda)
path="hybrid"
out=cvplogistic(y, x, penalty, approach, path, nkappa, maxkappa, nlambda)

Run the code above in your browser using DataLab