Fast and scalable model selection for the semiparametric additive hazards model via univariate screening combined with penalized regression.
ahazisis(surv, X, weights, standardize=TRUE,
nsis=floor(nobs/1.5/log(nobs)), do.isis=TRUE,
maxloop=5, penalty=sscad.control(), tune=cv.control(),
rank=c("FAST","coef","z","crit"))
Response in the form of a survival object, as returned by the
function Surv()
in the package survival. Right-censored
and counting process format (left-truncation) is supported. Tied
survival times are not supported.
Design matrix. Missing values are not supported.
Optional vector of observation weights. Default is 1 for each observation.
Logical flag for variable standardization, prior to
model fitting. Estimates are always returned on
the original scale. Default is standardize=TRUE
.
Number of covariates to recruit initially. If
do.isis=TRUE
, then this is also the maximal number of variables
that the algorithm will recruit. Default is
nsis=floor(nobs/log(nobs)/1.5)
Perform iterated independent screening?
Maximal number of iterations of the algorithm if do.isis=TRUE
.
Method to use for (re)recruitment of variables. See details.
A description of the penalty function to be used for
the variable selection part. This can be a character string naming a penalty
function (currently "lasso"
or stepwise SCAD, "sscad"
)
or a call to the penalty function. Default is
penalty=sscad.control()
. See ahazpen
and
ahazpen.pen.control
for more options and examples.
A description of the tuning method to be used for the
variable selection part. This can be
a character string naming a tuning control
function (currently "cv"
or "bic"
) or a call to the tuning control function. Default is
tune=cv.control()
. See ahaz.tune.control
for options
and examples.
An object with S3 class "ahazisis"
.
The call that produced this object.
The initial ranking order.
List (of length at most maxloop
) listing
the covariates
selected in each recruitment step.
List (of length at most maxloop
) listing
the covariates
selected in each variable selection step.
List (of length at most maxloop
)
listing the estimated penalized regression coefficients corresponding to
the indices in detail.ISISind
.
Indices of covariates selected in the initial recruitment step.
Indices of the final set of covariates selected by the iterated algorithm.
Vector of the penalized regression coefficients of the
covariates in ISISind
.
The argument nsis
.
The argument do.isis
.
The argument maxloop
.
The function is a basic implementation of the iterated sure independent screening method described in Gorst-Rasmussen & Scheike (2011). Briefly, the algorithm does the following:
Recruits the nsis
most relevant covariates by ranking them according to the univariate ranking
method described by rank
.
Selects, using ahazpen
with penalty function described
in penalty
, a model among the
top two thirds of the nsis
most relevant covariates. Call the
size of this model \(m\).
Recruits `nsis
minus \(m\)' new covariates among the non-selected
covariates by ranking their relevance according to the univariate
ranking method described in rank
, adjusted for the already
selected variables (using an unpenalized semiparametric additive
hazards model).
Steps 2-3 are iterated for maxloop
times, or until nsis
covariates has been recruited, or until the
set of selected covariate is stable between two iterations; whichever
comes first.
The following choices of ranking method exist:
rank="FAST"
corresponds to ranking, in the initial
recruitment step only, by the basic FAST- statistic
described in Gorst-Rasmussen & Scheike (2011). If do.isis=TRUE
then the algorithm sets rank="z"
for subsequent rankings.
rank="coef"
corresponds to ranking by absolute value of
(univariate) regression coefficients, obtained via ahaz
rank="z"
corresponds to ranking by the \(|Z|\)-statistic of
the (univariate) regression coefficients, obtained via ahaz
rank="crit"
corresponds to ranking by the size
of the decrease in
the (univariate) natural loss function used for estimation by ahaz
.
Gorst-Rasmussen, A. & Scheike, T. H. (2011). Independent screening for single-index hazard rate models with ultra-high dimensional features. Technical report R-2011-06, Department of Mathematical Sciences, Aalborg University.
# NOT RUN {
data(sorlie)
# Break ties
set.seed(10101)
time <- sorlie$time+runif(nrow(sorlie))*1e-2
# Survival data + covariates
surv <- Surv(time,sorlie$status)
X <- as.matrix(sorlie[,3:ncol(sorlie)])
# Basic ISIS/SIS with a single step
set.seed(10101)
m1 <- ahazisis(surv,X,maxloop=1,rank="coef")
m1
# Indices of the variables from the initial recruitment step
m1$SISind
# Indices of selected variables
m1$ISISind
# Check fit
score <- X[,m1$ISISind]%*%m1$ISIScoef
plot(survfit(surv~I(score>median(score))))
# }
Run the code above in your browser using DataLab