Learn R Programming

ncvreg (version 3.4-0)

ncvsurv: Fit an MCP- or SCAD-penalized survival model

Description

Fit coefficients paths for MCP- or SCAD-penalized Cox regression models over a grid of values for the regularization parameter lambda, with option for an additional L2 penalty.

Usage

ncvsurv(X, y, model=c("cox","aft"), penalty=c("MCP", "SCAD", "lasso"),
gamma=switch(penalty, SCAD=3.7, 3), alpha=1,
lambda.min=ifelse(n>p,.001,.05), nlambda=100, lambda, eps=.001,
max.iter=1000, convex=TRUE, dfmax=p, penalty.factor=rep(1, ncol(X)),
warn=TRUE, returnX=FALSE, ...)

Arguments

X
The design matrix of predictor values. ncvsurv standardizes the data prior to fitting.
y
The time-to-event outcome, as a two-column matrix or Surv object. The first column should be time on study (follow up time); the second column should be a binary variable with 1 indicating that the event has occurred and 0 indica
model
Either "cox" for a Cox proportional hazards model or "aft" for an accelerated failure time (AFT) model. Note: AFT models are not yet implemented in this release of ncvreg.
penalty
The penalty to be applied to the model. Either "MCP" (the default), "SCAD", or "lasso".
gamma
The tuning parameter of the MCP/SCAD penalty (see details). Default is 3 for MCP and 3.7 for SCAD.
alpha
Tuning parameter for the Mnet estimator which controls the relative contributions from the MCP/SCAD penalty and the ridge, or L2 penalty. alpha=1 is equivalent to MCP/SCAD penalty, while alpha=0 would be equivalent t
lambda.min
The smallest value for lambda, as a fraction of lambda.max. Default is .001 if the number of observations is larger than the number of covariates and .05 otherwise.
nlambda
The number of lambda values. Default is 100.
lambda
A user-specified sequence of lambda values. By default, a sequence of values of length nlambda is computed, equally spaced on the log scale.
eps
Convergence threshhold. The algorithm iterates until the relative change in any coefficient is less than eps. Default is .001.
max.iter
Maximum number of iterations. Default is 1000.x
convex
Calculate index for which objective function ceases to be locally convex? Default is TRUE.
dfmax
Upper bound for the number of nonzero coefficients. Default is no upper bound. However, for large data sets, computational burden may be heavy for models with a large number of nonzero coefficients.
penalty.factor
A multiplicative factor for the penalty applied to each coefficient. If supplied, penalty.factor must be a numeric vector of length equal to the number of columns of X. The purpose of penalty.factor is
warn
Return warning messages for failures to converge and model saturation? Default is TRUE.
returnX
Return the standardized design matrix? Default is FALSE.
...
Not used.

Value

  • An object with S3 class "ncvsurv" containing:
  • betaThe fitted matrix of coefficients. The number of rows is equal to the number of coefficients, and the number of columns is equal to nlambda.
  • iterA vector of length nlambda containing the number of iterations until convergence at each value of lambda.
  • lambdaThe sequence of regularization parameter values in the path.
  • penaltySame as above.
  • modelSame as above.
  • gammaSame as above.
  • alphaSame as above.
  • convex.minThe last index for which the objective function is locally convex. The smallest value of lambda for which the objective function is convex is therefore lambda[convex.min], with corresponding coefficients beta[,convex.min].
  • lossThe negative partial log-likelihood of the fitted model at each value of lambda.
  • penalty.factorSame as above.
  • nThe number of observations.
  • For Cox models, the following objects are also returned (and are necessary to estimate baseline survival conditonal on the estimated regression coefficients), all of which are ordered by time on study. I.e., the ith row of W does not correspond to the ith row of X):
  • WMatrix of exp(beta) values for each subject over all lambda values.
  • timeTimes on study.
  • failFailure event indicator.

Details

The sequence of models indexed by the regularization parameter lambda is fit using a coordinate descent algorithm. In order to accomplish this, the second derivative (Hessian) of the Cox partial log-likelihood is diagonalized (see references for details). The objective function is defined to be $$-\frac{1}{n}L(\beta|X,y) + \textrm{penalty},$$ where L is the partial log-likelihood from the Cox regression model.

Presently, ties are not handled by ncvsurv in a particularly sophisticated manner. This will be improved upon in a future release of ncvreg.

Adaptive rescaling (see references) is used for MCP and SCAD models. The convexity diagnostics rely on a fine covering of (lambda.min, lambda.max); choosing a low value of nlambda may produce unreliable results.

References

  • Breheny P and Huang J. (2011) Coordinate descentalgorithms for nonconvex penalized regression, with applications to biological feature selection.Annals of Applied Statistics,5: 232-253.myweb.uiowa.edu/pbreheny/publications/Breheny2011.pdf
  • Simon N, Friedman JH, Hastie T, and Tibshirani R. (2011) Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent.Journal of Statistical Software,39: 1-13.http://www.jstatsoft.org/v39/i05

See Also

plot.ncvreg, cv.ncvsurv

Examples

Run this code
data(Lung)
X <- Lung$X
y <- Lung$y

par(mfrow=c(2,2))
fit <- ncvsurv(X, y)
plot(fit, main=expression(paste(gamma,"=",3)))
fit <- ncvsurv(X, y, gamma=10)
plot(fit, main=expression(paste(gamma,"=",10)))
fit <- ncvsurv(X, y, gamma=1.5)
plot(fit, main=expression(paste(gamma,"=",1.5)))
fit <- ncvsurv(X, y, penalty="SCAD")
plot(fit, main=expression(paste("SCAD, ",gamma,"=",3)))

fit <- ncvsurv(X,y)
ll <- log(fit$lambda)
par(mfrow=c(2,1))
plot(ll, BIC(fit), type="l", xlim=rev(range(ll)))
lam <- fit$lambda[which.min(BIC(fit))]
b <- coef(fit, lambda=lam)
b[b!=0]
plot(fit)
abline(v=lam)

S <- predict(fit, X, type='survival', lambda=lam)
par(mfrow=c(1,1))
plot(S, xlim=c(0,200))

Run the code above in your browser using DataLab