calPlot: Calibration plots for right censored data

Description

Calibration plots for risk prediction models in right censored survival and competing risks data

Usage

calPlot(
  object,
  time,
  formula,
  data,
  splitMethod = "none",
  B = 1,
  M,
  pseudo,
  type,
  showPseudo,
  pseudo.col = NULL,
  pseudo.pch = NULL,
  method = "nne",
  round = TRUE,
  bandwidth = NULL,
  q = 10,
  bars = FALSE,
  hanging = FALSE,
  names = "quantiles",
  showFrequencies = FALSE,
  jack.density = 55,
  plot = TRUE,
  add = FALSE,
  diag = !add,
  legend = !add,
  axes = !add,
  xlim = c(0, 1),
  ylim = c(0, 1),
  xlab,
  ylab,
  col,
  lwd,
  lty,
  pch,
  cause = 1,
  percent = TRUE,
  giveToModel = NULL,
  na.action = na.fail,
  cores = 1,
  verbose = FALSE,
  cex = 1,
  ...
)

Value

list with elements: time, pseudoFrame and bandwidth (NULL for method quantile).

Arguments

object

A named list of prediction models, where allowed entries are (1) R-objects for which a predictSurvProb method exists (see details), (2) a call that evaluates to such an R-object (see examples), (3) a matrix with predicted probabilities having as many rows as data and as many columns as times. For cross-validation all objects in this list must include their call.

time

The evaluation time point at predicted event probabilities are plotted against pseudo-observed event status.

formula

A survival or event history formula. The left hand side is used to compute the expected event status. If formula is missing, try to extract a formula from the first element in object.

data

A data frame in which to validate the prediction models and to fit the censoring model. If data is missing, try to extract a data set from the first element in object.

splitMethod

Defines the internal validation design:

none/noPlan: Assess the models in the give data, usually either in the same data where they are fitted, or in independent test data.

BootCv: Bootstrap cross validation. The prediction models are trained on B bootstrap samples, that are either drawn with replacement of the same size as the original data or without replacement from data of the size M. The models are assessed in the observations that are NOT in the bootstrap sample.

B

The number of cross-validation steps.

M

The size of the subsamples for cross-validation.

pseudo

Logical. Determines the method for estimating expected event status:

TRUE: Use average pseudo-values. FALSE: Use the product-limit estimate, i.e., apply the Kaplan-Meier method for right censored survival and the Aalen-Johansen method for right censored competing risks data.

type

Either "risk" or "survival".

showPseudo

If TRUE the pseudo-values are shown as dots on the plot (only when pseudo=TRUE).

pseudo.col

Colour for pseudo-values.

pseudo.pch

Dot type (see par) for pseudo-values.

method

The method for estimating the calibration curve(s):

"nne": The expected event status is obtained in the nearest neighborhood around the predicted event probabilities.

"quantile": The expected event status is obtained in groups defined by quantiles of the predicted event probabilities.

round

If TRUE predicted probabilities are rounded to two digits before smoothing. This may have a considerable effect on computing efficiency in large data sets.

bandwidth

The bandwidth for method="nne"

q

The number of quantiles for method="quantile" and bars=TRUE.

bars

If TRUE, use barplots to show calibration.

hanging

Barplots only. If TRUE, hang bars corresponding to observed frequencies at the value of the corresponding prediction.

names

Barplots only. Names argument passed to names.arg of barplot.

showFrequencies

Barplots only. If TRUE, show frequencies above the bars.

jack.density

Gray scale for pseudo-observations.

plot

If FALSE, do not plot the results, just return a plottable object.

add

If TRUE the line(s) are added to an existing plot.

diag

If FALSE no diagonal line is drawn.

legend

If FALSE no legend is drawn.

axes

If FALSE no axes are drawn.

xlim

Limits of x-axis.

ylim

Limits of y-axis.

xlab

Label for y-axis.

ylab

Label for x-axis.

col

Vector with colors, one for each element of object. Passed to lines.

lwd

Vector with line widths, one for each element of object. Passed to lines.

lty

lwd Vector with line style, one for each element of object. Passed to lines.

pch

Passed to points.

cause

For competing risks models, the cause of failure or event of interest

percent

If TRUE axes labels are multiplied by 100 and thus interpretable on a percent scale.

giveToModel

List of with exactly one entry for each entry in object. Each entry names parts of the value of the fitted models that should be extracted and added to the value.

na.action

Passed to model.frame

cores

Number of cores for parallel computing. Passed as value of argument mc.cores to mclapply.

verbose

if TRUE report details of the progress, e.g. count the steps in cross-validation.

cex

Default cex used for legend and labels.

...

Used to control the subroutines: plot, axis, lines, barplot, legend. See SmartControl.

Author

Thomas Alexander Gerds tag@biostat.ku.dk

Details

For method "nne" the optimal bandwidth with respect to is obtained with the function dpik from the package KernSmooth for a box kernel function.

Examples

Run this code


library(prodlim)
library(lava)
library(riskRegression)
library(survival)
# survival
dlearn <- SimSurv(40)
dval <- SimSurv(100)
f <- coxph(Surv(time,status)~X1+X2,data=dlearn,x=TRUE,y=TRUE)
cf=calPlot(f,time=3,data=dval)
print(cf)
plot(cf)

g <- coxph(Surv(time,status)~X2,data=dlearn,x=TRUE,y=TRUE)
cf2=calPlot(list("Cox regression X1+X2"=f,"Cox regression X2"=g),
    time=3,
    type="risk",
    data=dval)
print(cf2)
plot(cf2)
calPlot(f,time=3,data=dval,type="survival")
calPlot(f,time=3,data=dval,bars=TRUE,pseudo=FALSE)
calPlot(f,time=3,data=dval,bars=TRUE,type="risk",pseudo=FALSE)

## show a red line which follows the hanging bars
calPlot(f,time=3,data=dval,bars=TRUE,hanging=TRUE)
a <- calPlot(f,time=3,data=dval,bars=TRUE,hanging=TRUE,abline.col=NULL)
lines(c(0,1,ceiling(a$xcoord)),
      c(a$offset[1],a$offset,a$offset[length(a$offset)]),
      col=2,lwd=5,type="s")

calPlot(f,time=3,data=dval,bars=TRUE,type="risk",hanging=TRUE)

set.seed(13)
m <- crModel()
regression(m, from = "X1", to = "eventtime1") <- 1
regression(m, from = "X2", to = "eventtime1") <- 1
m <- addvar(m,c("X3","X4","X5"))
distribution(m, "X1") <- binomial.lvm()
distribution(m, "X4") <- binomial.lvm()
d1 <- sim(m,100)
d2 <- sim(m,100)
csc <- CSC(Hist(time,event)~X1+X2+X3+X4+X5,data=d1)
fgr <- FGR(Hist(time,event)~X1+X2+X3+X4+X5,data=d1,cause=1)
if ((requireNamespace("cmprsk",quietly=TRUE))){
predict.crr <- cmprsk:::predict.crr
cf3=calPlot(list("Cause-specific Cox"=csc,"Fine-Gray"=fgr),
        time=5,
        legend.x=-0.3,
        legend.y=1.35,
        ylab="Observed event status",
        legend.legend=c("Cause-specific Cox regression","Fine-Gray regression"),
        legend.xpd=NA)
print(cf3)
plot(cf3)

b1 <- calPlot(list("Fine-Gray"=fgr),time=5,bars=TRUE,hanging=FALSE)
print(b1)
plot(b1)

calPlot(fgr,time=5,bars=TRUE,hanging=TRUE)
}

Run the code above in your browser using DataLab