survivalSL: Super Learner for Censored Outcomes

Description

This function allows to compute a Super Learner (SL) to predict survival outcomes.

Usage

survivalSL(formula, data, methods, metric="auc", penalty=NULL,
cv=10, param.tune=NULL, pro.time=NULL,
optim.local.min=FALSE, ROC.precision=seq(.01,.99,.01),
param.weights.fix=NULL, param.weights.init=NULL,
seed=NULL, optim.method="Nelder-Mead", maxit=1000,
show_progress=TRUE)

Value

times: A vector of numeric values with the times of the predictions.
predictions: It corresponds to a matrix with the survival predictions related to the SL.
FitALL: It corresponds to a list of matrix with the survival predictions related to each of the learner used for the SL construction.
formula: The formula object used for the SL construction.
data: The data frame used for learning.
ROC.precision: The percentiles (between 0 and 1) of the prognostic variable used for computing each point of the time dependent ROC curve.
cv: The number of splits for cross-validation.
methods: A vector of characters with the names of the algorithms included in the SL.
pro.time: The maximum delay for which the capacity of the variable is evaluated.
models: A list with the estimated models/algorithms included in the SL.
weights: A list composed by two vectors: the regressions coefficients of the logistic multinomial regression and the resulting weights' values.
metric: A list composed by two vectors: the loss function used to estimate the weights of the algorithms in the SL and its cross validation value.
param.tune: The estimated tunning parameters.
seed: The random seed used.
optim.method: The optimization method used.

Arguments

formula: A formula object, with the response on the left of a ~ operator, and the terms on the right. The response must be a survival object as returned by the Surv function.
data: A data frame whose columns correspond to the variables present in the formula.
methods: A vector of characters with the names of the algorithms included in the SL. At least two algorithms have to be included.
metric: The loss function or metric used to estimate the weights of the algorithms in the SL. See details.
penalty: A numerical vector that allows the integration of covariates into the final model after selection (It concerns "LIB_COXaic".) or/and allows the covariates not to be penalized (It concerns : "LIB_COXen" "LIB_COXlasso" and "LIB_COXridge".). We give the value 0 if we want to force the covariate in the model or/and not to be penalized otherwise 1. If NULL, all covariates undergo the selection process or/and penalization process.
cv: The number of splits for cross-validation. The default value is 10.
param.tune: A list with a length equals to the number of algorithms included in methods. If NULL, the tunning parameters are estimated (see details).
pro.time: This optional value of prognostic time represents the maximum delay for which the capacity of the variable is evaluated. The same unit than the one used in the argument times. Not used for the following metrics: "ll", "ibs", and "ibll". Default value is the time at which half of the subjects are still at risk.
optim.local.min: An optional logical value. If TRUE, the optimization is performed twice to better ensure the estimation of the weights. If FALSE (default value), the optimization is performed once.
ROC.precision: The percentiles (between 0 and 1) of the prognostic variable used for computing each point of the time dependent ROC curve. Only used when metric="auc". 0 (min) and 1 (max) are not allowed. By default: seq(.01,.99,.01).
param.weights.fix: A vector with the parameters of the multinomial logistic regression which generates the weights of the algorithms declared in methods. When completed, the related parameters are not estimated. The default value is NULL: the parameters are estimated by a cv-fold cross-validation. See details.
param.weights.init: A vector with the initial values of the parameters of the multinomial logistic regression which generates the weights of the algorithms declared in methods. The default value is NULL: the initial values are equaled to 0. See details.
seed: A random seed to ensure reproducibility. If NULL, a seed is randomly assigned.
optim.method: The optimization method used to estimate the weights. It can be either "SANN" or "Nelder-Mead". By default we use Nelder-Mead.
maxit: The number of iterations during the weight optimization process. By default, it is set to 1000.
show_progress: Parameter to display the progress bar. By default, it is set to TRUE.

Details

Each object of the list declared in param.tune must have the same name than the names of the methods included in the SL. If param.tune = NULL, survivalSL has already predefined default grids of tunning parameters for each algorithm in this case. The final tunning parameters are chosen thanks to cv-fold cross-validation (except for LIB_RSF, which uses the Out Of Bag observations to select the best hyperparameters based on the optimal value of the chosen metric). The following metrics can be used : "bs" for the Brier score at the prognostic time pro.time, "p_ci" and "uno_ci" for the concordance index at the prognostic time pro.time (Pencina and Uno versions), "ll" for the log-likelihood, "ibs" for the integrated Brier score up to the last observed time in the training data, "ibll" for the integrated binomial log-likelihood up to the last observed time in the training data, "ribs" for the restricted integrated Brier score up to the prognostic time pro.time, "ribll" for the restricted integrated binomial log-likelihood up to the prognostic time pro.time, "bll" for the binomial log-likelihood, "auc" for the area under the time-dependent ROC curve up to the prognostic time pro.time.

The following learners are available:

Names	Description	Package	`"LIB_AFTgamma"`
Gamma-distributed AFT model	flexsurv	`"LIB_AFTggamma"`	Generalized Gamma-distributed AFT model
flexsurv	`"LIB_AFTweibull"`	Weibull-distributed AFT model	flexsurv
`"LIB_PHexponential"`	Exponential-distributed PH model	flexsurv	`"LIB_PHgompertz"`
Gompertz-distributed PH model	flexsurv	`"LIB_PHspline"`	Spline-based PH model
flexsurv	`"LIB_COXall"`	Usual Cox model	survival
`"LIB_COXaic"`	Cox model with AIC-based forward selection	MASS	`"LIB_COXen"`
Elastic Net Cox model	glmnet	`"LIB_COXlasso"`	Lasso Cox model
glmnet	`"LIB_COXridge"`	Ridge Cox model	glmnet
`"LIB_RSF"`	Survival Random Forest	randomForestSRC	`"LIB_PLANN"`

The following loss functions for the estimation of the super learner weigths are available (metric):

Area under the ROC curve ("auc")
Pencina concordance index ("p_ci")
Uno concordance index ("uno_ci")
Brier score ("bs")
Binomial log-likelihood ("bll")
Integrated Brier score ("ibs")
Integrated binomial log-likelihood ("ibll")
Restricted integrated Brier score ("ribs")
Restricted integrated binomial log-Likelihood ("ribll")
Log-Likelihood ("ll")

References

Polley E and van der Laanet M. Super Learner In Prediction. http://biostats.bepress.com. 2010.

Examples

Run this code

data("dataDIVAT2")

# The Super Learner based from the first 200 individuals of the data base

formula<-Surv(times,failures) ~ age + hla + retransplant + ecd

sl1 <- survivalSL(formula=formula, data=dataDIVAT2[1:200,],
                  methods=c("LIB_AFTgamma", "LIB_PHgompertz"))

# Individual prediction
pred <- predict(sl1, newdata=data.frame(age=c(52,52), hla=c(0,1),
retransplant=c(1,1), ecd=c(0,1)))

plot(y=pred$predictions$sl[1,], x=pred$times, xlab="Time (years)",
ylab="Predicted survival", col=1, type="l", lty=1, lwd=2, ylim=c(0,1))

lines(y=pred$predictions$sl[2,], x=pred$times, col=2, type="l", lty=1, lwd=2)

legend("topright", col=c(1,2), lty=1, lwd=2, c("Subject #1", "Subject #2"))

Run the code above in your browser using DataLab