evalIntPredErr: Wrapper function to compute the integrated prediction error

Description

Convenient version to directly compute the integrated prediction error curve without the need to use multiple functions.

Usage

evalIntPredErr(hazPreds, survPreds=NULL, newTimeInput, newEventInput, 
trainTimeInput, trainEventInput, testDataLong, tmax=NULL)

Arguments

hazPreds

Predicted discrete hazards in the test data. The predictions have to be made with a data set in long format with prefix "dataLong", see e. g. dataLong, dataLongTimeDep.

survPreds

Predicted discrete survival functions of all persons (list of numeric vectors). Each list element corresponds to the survival function of one person beginning with the first time interval. The default NULL assumes that arguments "hazPreds" and "testDataLong" are available. If the last two arguments are not available, then argument "survPreds" needs to be specified.

newTimeInput

Discrete time intervals in short format of the test set (integer vector).

newEventInput

Events in short format in the test set (binary vector).

trainTimeInput

Discrete time intervals in short format of the training set (integer vector).

trainEventInput

Events in short format in the training set (binary vector).

testDataLong

Test data in long format. Needs to be specified only, if argument "survPreds" is NULL. In this case the discrete survival function is calculated based on the predicted hazards. It is assumed that the data was preprocessed with a function with prefix "dataLong", see e. g. dataLong, dataLongTimeDep.

tmax

Gives the maximum time interval for which prediction errors are calculated. It must be smaller than the maximum observed time in the training data of the object produced by function predErrDiscShort! Default=NULL means, that all observed intervals are used.

Value

Integrated prediction error (numeric scalar).

References

Gerhard Tutz and Matthias Schmid, (2016), Modeling discrete time-to-event data, Springer series in statistics, Doi: 10.1007/978-3-319-28158-2

Tilmann Gneiting and Adrian E. Raftery, (2007), Strictly proper scoring rules, prediction, and estimation, Journal of the American Statistical Association 102 (477), 359-376

Examples

Run this code

# NOT RUN {
##########################
# Example with cancer data

library(survival)
head(cancer)

# Data preparation and convertion to 30 intervals
cancerPrep <- cancer
cancerPrep$status <- cancerPrep$status-1
intLim <- quantile(cancerPrep$time, prob=seq(0, 1, length.out=30))
intLim [length(intLim)] <- intLim [length(intLim)] + 1

# Cut discrete time in smaller number of intervals
cancerPrep <- contToDisc(dataSet=cancerPrep, timeColumn="time", intervalLimits=intLim)

# Generate training and test data
set.seed(753)
TrainIndices <- sample (x=1:dim(cancerPrep) [1], size=dim(cancerPrep) [1]*0.75)
TrainCancer <- cancerPrep [TrainIndices, ]
TestCancer <- cancerPrep [-TrainIndices, ]
TrainCancer$timeDisc <- as.numeric(as.character(TrainCancer$timeDisc))
TestCancer$timeDisc <- as.numeric(as.character(TestCancer$timeDisc))

# Convert to long format
LongTrain <- dataLong(dataSet=TrainCancer, timeColumn="timeDisc", censColumn="status")
LongTest <- dataLong(dataSet=TestCancer, timeColumn="timeDisc", censColumn="status")
# Convert factors
LongTrain$timeInt <- as.numeric(as.character(LongTrain$timeInt))
LongTest$timeInt <- as.numeric(as.character(LongTest$timeInt))
LongTrain$sex <- factor(LongTrain$sex)
LongTest$sex <- factor(LongTest$sex)

# Estimate, for example, a generalized, additive model in discrete survival analysis
library(mgcv)
gamFit <- gam (formula=y ~ s(timeInt) + s(age) + sex + ph.ecog, data=LongTrain, family=binomial())
summary(gamFit)

# 1. Specification of predicted discrete hazards
# Estimate survival function of each person in the test data
testPredHaz <- predict(gamFit, newdata=LongTest, type="response")

# 1.1. Calculate integrated prediction error
evalIntPredErr(hazPreds=testPredHaz, survPreds=NULL, 
newTimeInput=TestCancer$timeDisc, newEventInput=TestCancer$status, 
trainTimeInput=TrainCancer$timeDisc, trainEventInput=TrainCancer$status, 
testDataLong=LongTest)

# 2. Alternative specification
oneMinusPredHaz <- 1-testPredHaz
predSurv <- aggregate(oneMinusPredHaz ~ obj, data=LongTest, FUN=cumprod, na.action=NULL)

# 2.1. Calculate integrated prediction error
evalIntPredErr(survPreds=predSurv[[2]], 
newTimeInput=TestCancer$timeDisc, newEventInput=TestCancer$status, 
trainTimeInput=TrainCancer$timeDisc, trainEventInput=TrainCancer$status)

# }

Run the code above in your browser using DataLab