ahazpen: Fit penalized semiparametric additive hazards model

Description

Fit a semiparametric additive hazards model via penalized estimating equations using, for example, the lasso penalty. The complete regularization path is computed at a grid of values for the penalty parameter lambda via the method of cyclic coordinate descent.

Usage

ahazpen(surv, X, weights,  standardize=TRUE,  penalty=lasso.control(),
        nlambda=100, dfmax=nvars, pmax=min(nvars, 2*dfmax),
        lambda.minf=ifelse(nobs < nvars,0.05, 1e-4), lambda,
        penalty.wgt=NULL, keep=NULL, control=list())

Arguments

surv

Response in the form of a survival object, as returned by the function Surv() in the package survival. Right-censored and counting process format (left-truncation) is supported. Tied survival times are not supported.

Design matrix. Missing values are not supported.

weights

Optional vector of observation weights. Default is 1 for each observation.

standardize

Logical flag for variable standardization, prior to model fitting. Estimates are always returned on the original scale. Default is standardize=TRUE.

penalty

A description of the penalty function to be used for model fitting. This can be a character string naming a penalty function (currently "lasso" or stepwise SCAD, "sscad") or a call to the desired penalty function. See ahazpen.pen.control for the available penalty functions and advanced options; see also the examples.

nlambda

The number of lambda values. Default is nlambda=100.

dfmax

Limit the maximum number of variables in the model. Unless a complete regularization path is needed, it is highly recommended to initially choose a relatively smaller value of dfmax to substantially reduce computation time.

pmax

Limit the maximum number of variables to ever be considered by the coordinate descent algorithm.

lambda.minf

Smallest value of lambda, as a fraction of lambda.max, the (data-derived) smallest value of lambda for which all regression coefficients are zero. The default depends on the sample size nobs relative to the number of variables nvars. If nobs >= nvars, the default is 0.0001, close to zero. When nobs < nvars, the default is 0.05.

lambda

An optional user supplied sequence of penalty parameters. Typical usage is to have the program compute its own lambda sequence based on nlambda and lambda.minf. A user-specified lambda sequence overrides dfmax but not pmax.

penalty.wgt

A vector of nonnegative penalty weights for each regression coefficient. This is a number that multiplies lambda to allow differential penalization. Can be 0 for some variables, which implies no penalization so that the variable is always included in the model; or Inf which implies that the variable is never included in the model. Default is 1 for all variables.

keep

A vector of indices of variables which should always be included in the model (no penalization). Equivalent to specifying a penalty.wgt of 0.

control

A list of parameters for controlling the model fitting algorithm. The list is passed to ahazpen.fit.control.

Value

An object with S3 class "ahazpen".

call

The call that produced this object

beta

An nvars x length(lambda) matrix (in sparse column format, class dgCMatrix) of penalized regression coefficients.

lambda

The sequence of actual lambda values used.

The number of nonzero coefficients for each value of lambda.

nobs

Number of observations.

nvars

Number of covariates.

surv

A copy of the argument survival.

npasses

Total number of passes by the fitting algorithm over the data, for all lambda values.

penalty.wgt

The actually used penalty.wgt.

penalty

An object of class ahaz.pen.control, as specified by penalty.

dfmax

A copy of dfmax.

penalty

A copy of pmax.

Details

Fits the sequence of models implied by the penalty function penalty, the sequence of penalty parameters lambda by using the very efficient method of cyclic coordinate descent.

For data sets with a very large number of covariates, it is recommended to only calculate partial paths by specifying a smallish value of dmax.

The sequence lambda is computed automatically by the algorithm but can also be set (semi)manually by specifying nlambda or lambda. The stability and efficiency of the algorithm is highly dependent on the grid lambda values being reasonably dense, and lambda (and nlambda) should be specified accordingly. In particular, it is not recommended to specify a single or a few lambda values. Instead, a partial regularization path should be calculated and the functions predict.ahazpen or coef.ahazpen should be used to extract coefficient estimates at specific lambda values.

References

Gorst-Rasmussen A., Scheike T. H. (2012). Coordinate Descent Methods for the Penalized Semiparametric Additive Hazards Model. Journal of Statistical Software, 47(9):1-17. http://www.jstatsoft.org/v47/i09/

Gorst-Rasmussen, A. & Scheike, T. H. (2011). Independent screening for single-index hazard rate models with ultra-high dimensional features. Technical report R-2011-06, Department of Mathematical Sciences, Aalborg University.

Leng, C. & Ma, S. (2007). Path consistent model selection in additive risk model via Lasso. Statistics in Medicine; 26:3753-3770.

Martinussen, T. & Scheike, T. H. (2008). Covariate selection for the semiparametric additive risk model. Scandinavian Journal of Statistics; 36:602-619.

Zou, H. & Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models, Annals of Statistics; 36:1509-1533.

Examples

Run this code

# NOT RUN {
data(sorlie)

# Break ties
set.seed(10101)
time <- sorlie$time+runif(nrow(sorlie))*1e-2

# Survival data + covariates
surv <- Surv(time,sorlie$status)
X <- as.matrix(sorlie[,3:ncol(sorlie)])

# Fit additive hazards regression model
fit1 <- ahazpen(surv, X,penalty="lasso", dfmax=30)
fit1
plot(fit1)

# Extend the grid to contain exactly 100 lambda values
lrange <- range(fit1$lambda)
fit2 <- ahazpen(surv, X,penalty="lasso", lambda.minf=lrange[1]/lrange[2])
plot(fit2)

# User-specified lambda sequence
lambda <- exp(seq(log(0.30), log(0.1), length = 100))
fit2 <- ahazpen(surv, X, penalty="lasso", lambda = lambda)
plot(fit2)

# Advanced usage - specify details of the penalty function
fit4 <- ahazpen(surv, X,penalty=sscad.control(nsteps=2))
fit4
fit5 <- ahazpen(surv, X,penalty=lasso.control(alpha=0.1))
plot(fit5)
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

Details

References

See Also

Examples