riskRegression (version 2018.04.21)

predictCox: Fast computation of survival probabilities, hazards and cumulative hazards from Cox regression models

Description

Fast routine to get baseline hazards and subject specific hazards as well as survival probabilities from a survival::coxph or rms::cph object

Usage

predictCox(object, newdata = NULL, times, centered = TRUE,
  type = c("cumhazard", "survival"), keep.strata = TRUE,
  keep.times = TRUE, keep.newdata = FALSE, se = FALSE, band = FALSE,
  iid = FALSE, average.iid = FALSE, nsim.band = 10000,
  conf.level = 0.95, log.transform = TRUE, store.iid = "full")

Arguments

object

The fitted Cox regression model object either obtained with coxph (survival package) or cph (rms package).

newdata

A data.frame or data.table containing the values of the predictor variables defining subject specific predictions. Should have the same structure as the data set used to fit the object.

times

Time points at which to evaluate the predictions.

centered

Logical. If TRUE return prediction at the mean values of the covariates fit$mean, if FALSE return a prediction for all covariates equal to zero. in the linear predictor. Will be ignored if argument newdata is used. For internal use.

type

the type of predicted value. Choices are

  • "hazard" the baseline hazard function when argument newdata is not used and the hazard function when argument newdata is used.

  • "cumhazard" the cumulative baseline hazard function when argument newdata is not used and the cumulative hazard function when argument newdata is used.

  • "survival" the survival baseline hazard function when argument newdata is not used and the cumulative hazard function when argument newdata is used.

Several choices can be combined in a vector of strings that match (no matter the case) strings "hazard","cumhazard", "survival".

keep.strata

Logical. If TRUE add the (newdata) strata to the output. Only if there any.

keep.times

Logical. If TRUE add the evaluation times to the output.

keep.newdata

Logical. If TRUE add the value of the covariates used to make the prediction in the output list.

se

Logical. If TRUE add the standard error to the output.

band

Logical. If TRUE add the confidence band to the output.

iid

Logical. If TRUE add the influence function to the output.

average.iid

Logical. If TRUE add the average of the influence function over newdata to the output.

nsim.band

the number of simulations used to compute the quantiles for the confidence bands.

conf.level

Level of confidence.

log.transform

Should the confidence intervals/bands be computed on the log (hazard) and log(-log) (survival) scale and be backtransformed. Otherwise they are computed on the original scale and truncated (if necessary).

store.iid

Implementation used to estimate the influence function and the standard error. Can be "full" or "minimal".

...

arguments to be passed to the function iidCox.

Value

A list with some or all of the following elements:

  • times: the time points at which the other elements are evaluated.

  • hazard: When argument newdata is not used the baseline hazard function, otherwise the predicted hazard function.

  • cumhazard: When argument newdata is not used the cumulative baseline hazard function, otherwise the predicted cumulative hazard function.

  • survival: When argument newdata is not used the survival probabilities corresponding to the baseline hazard, otherwise the predicted survival probabilities.

  • cumhazard.se/survival.se: The standard errors of the predicted cumulative hazard function/survival.

  • (hazard.iid/cumhazard.iid/survival.iid): (array) the value of the influence of each subject used to fit the object (dim 3) for each subject in newdata (dim 1) and each time (dim 2).

  • (cumhazard.average.iid/survival.average.iid): (array) the average value of the influence over the subsjects in newdata, for each subject used to fit the model (dim 1) and each time (dim 2).

  • strata: The strata variable.

Details

When the argument newdata is not specified, the function computes the baseline hazard estimate. See (Ozenne et al., 2017) section "Handling of tied event times".

Otherwise the function computes survival probabilities with confidence intervals/bands. See (Ozenne et al., 2017) section "Confidence intervals and confidence bands for survival probabilities". The survival is computed using the exponential approximation (equation 3).

When setting log.transform to TRUE, the standard error that is returned is before back-transformation to the original scale.

A detailed explanation about the meaning of the argument store.iid can be found in (Ozenne et al., 2017) Appendix B "Saving the influence functions".

The function is not compatible with time varying predictor variables.

The centered argument enables us to reproduce the results obtained with the basehaz function from the survival package but should not be modified by the user.

References

Brice Ozenne, Anne Lyngholm Sorensen, Thomas Scheike, Christian Torp-Pedersen and Thomas Alexander Gerds. riskRegression: Predicting the Risk of an Event using Cox Regression Models. The R Journal (2017) 9:2, pages 440-460.

Examples

Run this code
# NOT RUN {
library(survival)

## generate data
set.seed(10)
d <- sampleData(40,outcome="survival") ## training dataset
nd <- sampleData(4,outcome="survival") ## validation dataset
d$time <- round(d$time,1) ## create tied events
# table(duplicated(d$time))

## estimate a stratified Cox model
fit <- coxph(Surv(time,event)~X1 + strata(X2) + X6,
             data=d, ties="breslow", x = TRUE, y = TRUE)

## compute the baseline cumulative hazard
fit.haz <- predictCox(fit)
cbind(survival::basehaz(fit), fit.haz$cumhazard)

## compute individual specific cumulative hazard and survival probabilities 
predictCox(fit, newdata=nd, times=c(3,8))

## add confidence intervals computed on the original scale
CI.original <- predictCox(fit, newdata=nd, times=c(3,8), se = TRUE, log.transform = FALSE)
as.data.table(CI.original)
CI.original$cumhazard + 1.96 * CI.original$cumhazard.se 

## add confidence intervals computed on the log (or log-log) scale
## and backtransformed
CI.log <- predictCox(fit, newdata=nd, times=c(3,8), se = TRUE, log.transform = TRUE)
as.data.table(CI.log)
exp(log(CI.log$cumhazard) + 1.96 * CI.original$cumhazard.se/CI.log$cumhazard)
exp(log(CI.log$cumhazard) + 1.96 * CI.log$cumhazard.se)

## export iid decomposition relative to the survival probabilities
CI.iid <- predictCox(fit, newdata = d, times = 5, iid = TRUE, se = TRUE,
                      log.transform = FALSE)
as.data.table(CI.iid)[1:5]
rowMeans(CI.iid$survival.iid[,1,]) ## the iid decomposition has 0 expectation
sqrt(rowSums(CI.iid$survival.iid[1:5,1,]^2))

## same but the iid decomposition is averaged over the patients
CI.aviid <- predictCox(fit, newdata = d, times = 5, 
                       average.iid = TRUE, log.transform = FALSE)
CI.aviid$survival.average.iid[1:5,]
colMeans(CI.iid$survival.iid[,1,1:5])

## export confidence bands (by default computed on the log scale and backtransformed)
predictCox(fit, newdata=nd, times=c(3,8), se = TRUE, band = TRUE)

## other examples
# one strata variable
fitS <- coxph(Surv(time,event)~strata(X1)+X2,
              data=d, ties="breslow", x = TRUE, y = TRUE)

predictCox(fitS)
predictCox(fitS, newdata=nd, times = 1)

# two strata variables
set.seed(1)
d$U=sample(letters[1:5],replace=TRUE,size=NROW(d))
d$V=sample(letters[4:10],replace=TRUE,size=NROW(d))
nd$U=sample(letters[1:5],replace=TRUE,size=NROW(nd))
nd$V=sample(letters[4:10],replace=TRUE,size=NROW(nd))
fit2S <- coxph(Surv(time,event)~X1+strata(U)+strata(V)+X2,
              data=d, ties="breslow", x = TRUE, y = TRUE)

cbind(survival::basehaz(fit2S),predictCox(fit2S,type="cumhazard")$cumhazard)
predictCox(fit2S)
predictCox(fitS, newdata=nd, times = 3)

# left truncation
test2 <- list(start=c(1,2,5,2,1,7,3,4,8,8), 
              stop=c(2,3,6,7,8,9,9,9,14,17), 
              event=c(1,1,1,1,1,1,1,0,0,0), 
              x=c(1,0,0,1,0,1,1,1,0,0)) 
m.cph <- coxph(Surv(start, stop, event) ~ 1, test2, x = TRUE)
as.data.table(predictCox(m.cph))

basehaz(m.cph)
# }

Run the code above in your browser using DataLab