CoxBoost
), where cross-validation can be performed automatically for determining the number of boosting steps (via a call to cv.CoxBoost
).
iCoxBoost(formula,data=NULL,weights=NULL,subset=NULL,mandatory=NULL,
cause=1,standardize=TRUE,stepno=200,
criterion=c("pscore","score","hpscore","hscore"),
nu=0.1,stepsize.factor=1,varlink=NULL,
cv=cvcb.control(),trace=FALSE,...)
coxph
. The response must be a survival object, either as returned by Surv
or Hist
(in a competing risks application).Hist
(see e.g. Fine and Gray, 1999; Binder et al. 2009a)."pscore"
corresponds to the penalized score statistics, "score"
to the un-penalized score statistics. Different results will only be seen for un-standardized covariates ("pscore"
will result in preferential selection of covariates with larger covariance), or if different penalties are used for different covariates. "hpscore"
and "hscore"
correspond to "pscore"
and "score"
. However, a heuristic is used for evaluating only a subset of covariates in each boosting step, as described in Binder et al. (2011). This can considerably speed up computation, but may lead to different results.CoxBoost
. Use smaller values, e.g., 0.01 when there is little information in the data, and larger values, such as 0.1, with much information or when the number of events is larger than the number of covariates. Note that the default for direct calls to CoxBoost
corresponds to nu=0.1
.1
) implies constant nu
, for a value < 1 the value nu
for a covariate is decreased after it has been selected in a boosting step, and for a value > 1 the value nu
is increased. If pendistmat
is given, updates of nu
are only performed for covariates that have at least one connection to another covariate.stepsize.factor != 1
. The list needs to contain at least two vectors, the first containing the name of the source covariates, the second containing the names of the corresponding target covariates, and a third (optional) vector containing weights between 0 and 1 (defaulting to 1). If nu
is increased/descreased for one of the source covariates according to stepsize.factor
, the nu
for the corresponding target covariate is descreased/increased accordingly (multiplied by the weight). If formula
contains interaction terms, als rules for these can be set up, using variable names such as V1:V2
for the interaction term between covariates V1
and V2
.TRUE
, for performing cross-validation, with default parameters, FALSE
for not performing cross-validation, or list containing the parameters for cross-validation, as obtained from a call to cvcb.control
.cv.CoxBoost
.iCoxBoost
returns an object of class iCoxBoost
, which also has class CoxBoost
. In addition to the elements from CoxBoost
it has the following elements:
cv.CoxBoost
, if cross-validation has been performed.glmboost
routine in the R package mboost
, using the CoxPH
loss function), CoxBoost
is not based on gradients of loss functions, but adapts the offset-based boosting approach from Tutz and Binder (2007) for estimating Cox proportional hazards models. In each boosting step the previous boosting steps are incorporated as an offset in penalized partial likelihood estimation, which is employed for obtain an update for one single parameter, i.e., one covariate, in every boosting step. This results in sparse fits similar to Lasso-like approaches, with many estimated coefficients being zero. The main model complexity parameter, the number of boosting steps, is automatically selected by cross-validation using a call to cv.CoxBoost
). Note that this will introduce random variation when repeatedly calling iCoxBoost
, i.e. it is advised to set/save the random number generator state for reproducible results.The advantage of the offset-based approach compared to gradient boosting is that the penalty structure is very flexible. In the present implementation this is used for allowing for unpenalized mandatory covariates, which receive a very fast coefficient build-up in the course of the boosting steps, while the other (optional) covariates are subjected to penalization.
For example in a microarray setting, the (many) microarray features would be taken to be optional covariates, and the (few) potential clinical covariates would be taken to be mandatory, by including their names in mandatory
.
If a group of correlated covariates has influence on the response, e.g. genes from the same pathway, componentwise boosting will often result in a non-zero estimate for only one member of this group. To avoid this, information on the connection between covariates can be provided in varlink
. If then, in addition, a penalty updating scheme with stepsize.factor
< 1 is chosen, connected covariates are more likely to be chosen in future boosting steps, if a directly connected covariate has been chosen in an earlier boosting step (see Binder and Schumacher, 2009b).
Binder, H., Allignol, A., Schumacher, M., and Beyersmann, J. (2009). Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics, 25:890-896.
Binder, H. and Schumacher, M. (2009). Incorporating pathway information into boosting estimation of high-dimensional risk prediction models. BMC Bioinformatics. 10:18.
Binder, H. and Schumacher, M. (2008). Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models. BMC Bioinformatics. 9:14.
Tutz, G. and Binder, H. (2007) Boosting ridge regression. Computational Statistics \& Data Analysis, 51(12):6044-6059.
Fine, J. P. and Gray, R. J. (1999). A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association. 94:496-509.
predict.iCoxBoost
, CoxBoost
, cv.CoxBoost
.
# Generate some survival data with 10 informative covariates
n <- 200; p <- 100
beta <- c(rep(1,2),rep(0,p-2))
x <- matrix(rnorm(n*p),n,p)
actual.data <- as.data.frame(x)
real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta)))
cens.time <- rexp(n,rate=1/10)
actual.data$status <- ifelse(real.time <= cens.time,1,0)
actual.data$time <- ifelse(real.time <= cens.time,real.time,cens.time)
# Fit a Cox proportional hazards model by iCoxBoost
cbfit <- iCoxBoost(Surv(time,status) ~ .,data=actual.data)
summary(cbfit)
plot(cbfit)
# ... with covariates 1 and 2 being mandatory
cbfit.mand <- iCoxBoost(Surv(time,status) ~ .,data=actual.data,mandatory=c("V1"))
summary(cbfit.mand)
plot(cbfit.mand)
Run the code above in your browser using DataCamp Workspace