Learn R Programming

pmcalibration: Calibration Curves for Clinical Prediction Models

A clinical prediction model should produce well calibrated risk predictions, meaning the predicted probabilities should align with observed outcome rates. There are different levels at which calibration can be assessed (see https://pubmed.ncbi.nlm.nih.gov/26772608/); this package focuses on assessing “moderate” calibration via non-linear calibration curves. pmcalibration implements calibration curves for binary and (right censored) time-to-event outcomes and calculates metrics used to assess the correspondence between predicted and observed outcome probabilities (the ‘integrated calibration index’ or $ICI$, aka $E_{avg}$, as well as $E_{50}$, $E_{90}$, and $E_{max}$ - see below).

A goal of pmcalibration is to implement a range of methods for estimating a smooth relationship between predicted and observed probabilities and to provide confidence intervals for calibration metrics (via bootstrapping or simulation based inference). Users are able to transform predicted risks before creating calibration curve (for example, logit transforming appears to improve performance when using a regression spline - https://doi.org/10.31219/osf.io/4n86q).

The examples below demonstrate usage of the package.

Binary outcome

# to install
install.packages("pmcalibration") # cran
# or 
devtools::install_github("stephenrho/pmcalibration") # development
library(pmcalibration)

# simulate some data for vignette
set.seed(2345)
dat <- sim_dat(1000, a1 = -1, a3 = 1)

# show the first 3 columns (col 4 is the true linear predictor/LP)
head(dat[-4])
#>            x1         x2 y
#> 1 -1.19142464 -0.9245914 0
#> 2  0.54930055 -1.0019698 0
#> 3 -0.06240514  1.5438665 1
#> 4  0.26544150  0.1632147 1
#> 5 -0.23459751 -1.2009388 0
#> 6 -0.99727160 -1.1899600 0

We have data with a binary outcome, y, and two ‘predictor’ variables, x1 and x2. Suppose we have an existing model for predicting y from x1 and x2 that is as follows

p(y = 1) = plogis( -1 + 1*x1 + 1*x2 )

To externally validate this model on this new data we need to calculate the predicted probabilities. We’ll also extract the observed outcomes.

p <- plogis(with(dat, -1 + x1 + x2))
y <- dat$y

First we can check weak calibration:

(lcal <- logistic_cal(y = y, p = p))
#> Logistic calibration intercept and slope:
#> 
#>                       Estimate Std. Error z value Pr(>|z|) lower upper
#> Calibration Intercept    -0.11      0.080   -1.38     0.17 -0.27 0.046
#> Calibration Slope         1.06      0.078    0.80     0.42  0.91 1.220
#> 
#> z-value for calibration slope is relative to slope = 1.
#> lower and upper are the bounds of 95% profile confidence intervals.
#> 
#> Likelihood ratio tests (a = intercept, b = slope):
#> 
#>                                              statistic df Pr(>Chi)
#> Weak calibration - H0: a = 0, b = 1               2.57  2     0.28
#> Calibration in the large - H0: a = 0 | b = 1      1.91  1     0.17
#> Calibration slope - H0: b = 1 | a                 0.65  1     0.42

The top part of the printed summary gives estimates of the calibration intercept and slope and their 95% CIs. The bottom part of the printed summary gives likelihood ratio tests (see Miller et al. 1993) assessing (1) weak calibration as a whole (the null hypothesis of intercept = 0 and slope = 1), (2) calibration in the large (H0: intercept = 0 given slope = 1), and (3) the calibration slope (H0: slope = 1). This output suggests the model the model is reasonably weakly calibrated: the calibration intercept and slope don’t clearly differ from 0 and 1, respectively.

We can use a calibration curve to assess ‘moderate’ calibration. Below we use pmcalibration to fit a flexible calibration curve, allowing for a non-linear relationship between predicted and actual probabilities.

In the example below, we fit a calibration curve using a restricted cubic spline with 5 knots (see ?rms::rcs). transf="logit" signals that the predicted risks should be logit transformed before fitting the calibration curve (this is the default for a binary y). pmcalibration calculates various metrics from the absolute difference between the predicted probability and the actual probability (as estimated by the calibration curve). In this case 95% confidence intervals for these metrics are calculated via simulation based inference (ci = "sim") with 1000 replicates. Alternatively we could have chosen bootstrap confidence intervals (ci = "boot").

(cc <- pmcalibration(y = y, p = p, 
                     smooth = "rcs", nk = 5,
                     transf="logit",
                     ci = "sim", 
                     n=1000))
#> Calibration metrics based on a calibration curve estimated for a binary outcome via a restricted cubic spline (see ?rms::rcs) using logit transformed predicted probabilities.
#> 
#>      Estimate lower upper
#> Eavg    0.054 0.035 0.076
#> E50     0.059 0.031 0.079
#> E90     0.083 0.060 0.134
#> Emax    0.138 0.081 0.448
#> ECI     0.368 0.177 0.751
#> 
#> 95% confidence intervals calculated via simulation based inference with 1000 replicates.

The printed metrics can be interpreted as follows:

  • Eavg suggests that the average difference between prediction and actual probability of the outcome is 0.054 (or 5%) with a 95% CI of [0.035, 0.076].
  • E50 is the median difference between prediction and observed probability (inferred from calibration curve). 50% of differences are 0.059 or smaller.
  • E90 is the 90th percentile difference. 90% of differences are 0.083 or smaller.
  • Emax is the largest observed difference between predicted and observed probability. The model can be off by up to 0.14, with a broad confidence interval.
  • ECI is the average squared difference between predicted and observed probabilities (multiplied by 100). See Van Hoorde et al. (2015).

pmcalibration produces a plot by default, as shown above. A more custom plot can be obtained via plot.

plot(cc, xlab="Predicted Risk of Outcome", ylab="Expected Proportion with Outcome", fillcol = "blue", ideallty = 0)

Or one could use get_curve to extract data for plotting with method of your choice.

pcc <- get_curve(cc)
head(pcc) 
#>             p        p_c      lower     upper
#> 1 0.005804309 0.14402182 0.03580601 0.4538340
#> 2 0.015274021 0.11541021 0.04965833 0.2590454
#> 3 0.024743733 0.10296340 0.05610697 0.1868902
#> 4 0.034213445 0.09521032 0.05846271 0.1543373
#> 5 0.043683157 0.08985188 0.05926845 0.1357910
#> 6 0.053152869 0.08642764 0.05780335 0.1307891
# p = predicted risk (x-axis; this is not p provided to pmcalibration but is determined by eval)
# p_c = risk implied by calibration curve (y-axis)

The model in its current form very slightly overestimates risk at low levels of predicted risk and then underestimates risk at predicted probabilities of over around 0.6.

The results above can be compared with rms::val.prob. Note that this uses lowess(p, y, iter=0) to fit a calibration curve. In this case lowess results in the curve extending beyond the possible range of risks, but the Emax, E90, and Eavg point estimates are consistent with those above.

library(rms)
#> Loading required package: Hmisc
#> 
#> Attaching package: 'Hmisc'
#> The following objects are masked from 'package:base':
#> 
#>     format.pval, units
val.prob(p = p, y = y) |> 
  round(3)
#>       Dxy   C (ROC)        R2         D  D:Chi-sq       D:p         U  U:Chi-sq 
#>     0.627     0.813     0.348     0.280   281.217     0.000     0.001     2.566 
#>       U:p         Q     Brier Intercept     Slope      Emax       E90      Eavg 
#>     0.277     0.280     0.148    -0.076     1.062     0.145     0.073     0.044 
#>       S:z       S:p 
#>    -2.238     0.025

Note also that the calibration intercept reported by rms::val.prob comes from the same logistic regression as that used to estimate the calibration slope. In logistic_cal the calibration intercept is estimated via a glm with logit transformed predicted probabilities included as an offset term (i.e., with slope fixed to 1 - see, e.g., Van Calster et al., 2016). The calibration slope is estimated via a separate glm. We can confirm this by accessing the corresponding estimates from the logistic_cal object.

# access the model used to get calibration slope
# and compare to estimates from val.prob
coef(lcal$calibration_slope) |> 
  round(3)
#> (Intercept)          LP 
#>      -0.076       1.062

Time to event outcome

The code below produces a calibration curve, and associated metrics, for a time-to-event outcome. The curve has to be constructed for predictions at a given time point, so an extra argument time should be specified. Here we use a restricted cubic spline with 5 knots to assess predictions at time = 15. In this case we use ci="boot" to get bootstrap confidence intervals for the metrics and curve (ci="sim" is currently unsupported for time-to-event outcomes). By default predicted risks are transformed via the complementary log-log transformation (function(x) log(-log(1 - x))) before estimating the calibration curve.

library(simsurv)
library(survival)

# simulate some data
n <- 2000
X <- data.frame(id = seq(n), x1 = rnorm(n), x2 = rnorm(n))
X$x3 <- X$x1*X$x2 # interaction

b <- c("x1" = -.2, "x2" = -.2, "x3" = .1)

d <- simsurv(dist = "weibull", lambdas = .01, gammas = 1.5, x = X, betas = b, seed = 246)

mean(d$eventtime)
#> [1] 19.60637
median(d$eventtime)
#> [1] 16.52855
mean(d$status) # no censoring
#> [1] 1

d <- cbind(d, X[,-1])

head(d)
#>   id eventtime status         x1         x2          x3
#> 1  1 12.749281      1  0.7534077  0.8486379  0.63937033
#> 2  2 24.840161      1  0.4614734 -2.1876625 -1.00954805
#> 3  3  9.087482      1 -0.6338945 -1.8948297  1.20112211
#> 4  4 24.811402      1 -1.0248165  0.6541197 -0.67035271
#> 5  5 19.072266      1 -0.1673414 -0.4625003  0.07739544
#> 6  6 13.595427      1  0.2376988  0.6452848  0.15338343

# split into development and validation
ddev <- d[1:1000, ]
dval <- d[1001:2000, ]

# fit a cox model
cph <- coxph(Surv(eventtime, status) ~ x1 + x2, data = ddev)

# predicted probability of event at time = 15
p = 1 - exp(-predict(cph, type="expected", newdata = data.frame(eventtime=15, status=1, x1=dval$x1, x2=dval$x2)))

y <- with(dval, Surv(eventtime, status))

# calibration curve at time = 15
(cc <- pmcalibration(y = y, p = p, smooth = "rcs", nk = 5, 
                     ci = "boot", time = 15))
#> Calibration metrics based on a calibration curve estimated for a time-to-event outcome (time = 15) via a restricted cubic spline (see ?rms::rcs) using complementary log-log transformed predicted probabilities.
#> 
#>      Estimate lower upper
#> Eavg    0.051 0.028 0.080
#> E50     0.046 0.020 0.077
#> E90     0.077 0.052 0.126
#> Emax    0.245 0.115 0.407
#> ECI     0.349 0.146 0.813
#> 
#> 95% confidence intervals calculated via bootstrap resampling with 1000 replicates.
mtext("time = 15")

Compare to rms::val.surv, which with the arguments specified below uses polspline::hare to fit a calibration curve. Note val.surv uses probability of surviving until time = u not probability of event occurring by time = u.


(vs <- val.surv(S = y, est.surv = 1-p, u=15, 
                fun = function(x) log(-log(x))))
#> 
#> Validation of Predicted Survival at Time= 15     n= 1000 , events= 1000 
#> 
#> hare fit:
#> 
#> dim A/D   loglik       AIC        penalty 
#>                                 min    max 
#>   1 Add  -3949.05   7905.02  148.71     Inf
#>   2 Add  -3874.70   7763.22   60.48  148.71
#>   3 Add  -3844.46   7709.64   32.04   60.48
#>   4 Del  -3828.44   7684.51    7.10   32.04
#>   5 Add  -3824.89   7684.32    0.00    7.10
#> 
#> the present optimal number of dimensions is 5.
#> penalty(AIC) was 6.91, the default (BIC), would have been 6.91.
#> 
#>   dim1           dim2           beta        SE         Wald
#> Constant                            -2.6       0.25  -10.69
#> Time        26                    -0.031      0.005   -6.27
#> Co-1  linear                       0.063       0.26    0.24
#> Time       7.2                     -0.15      0.028   -5.42
#> Co-1     -0.81                      0.92       0.34    2.74
#> 
#> Function used to transform predictions:
#> function (x)  log(-log(x))
#> 
#> Mean absolute error in predicted probabilities: 0.0352 
#> 0.9 Quantile of absolute errors               : 0.0760
plot(vs, lim=0:1)

We can make a plot that is easier to compare.

x <- get_curve(cc)

with(x, plot(1-p, 1-p_c, type="l", xlim=0:1, ylim=0:1, 
             xlab="Predicted Probability of Surviving 15", 
             ylab="Actual Probability of Surviving 15"))
matplot(1-x$p, y = 1-x[, 3:4], type = "l", lty=2, 
        col="black", add = TRUE)
abline(0,1, col="red", lty=2)

Internal validation

pmcalibration can be used to assess apparent calibration in a development sample or to externally validate an existing prediction model. For conducting internal validation (via bootstrap optimism or cross-validation) users are encouraged to look at https://stephenrho.github.io/pminternal/.

Copy Link

Version

Install

install.packages('pmcalibration')

Monthly Downloads

334

Version

0.2.0

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Stephen Rhodes

Last Published

February 21st, 2025

Functions in pmcalibration (0.2.0)

pmcalibration

Create a calibration curve
riskdist

Make a plot of predicted risks by outcome
print.pmcalibrationsummary

Print summary of pmcalibration object
reg_spline_X

Make a design matrix for regression spline
simb

Simulation based inference with a calibration curve object
summary.logistic_cal

Summarize a logistic_cal object
summary.pmcalibration

Summarize a pmcalibration object
plot.pmcalibration

Plot a calibration curve
pmcalibration-package

pmcalibration: Calibration Curves for Clinical Prediction Models
run_boots

Wrapper to run bootstrap resamples using parallel
sim_dat

Simulate a binary outcome with either a quadratic relationship or interaction
logit

Logit transformation
get_curve

Extract plot data from pmcalibration object
logistic_cal

Run logistic calibration
cal_metrics

Calculate calibration metrics from calibration curve
print.pmcalibration

print a pmcalibration object
print.logistic_cal

Print a logistic_cal object
invlogit

Inverse logit transformation
gam_cal

fits a calibration curve via gam
lowess_cal

calibration curve via lowess
boot

Bootstrap resample a calibration curve object
print.logistic_calsummary

Print a logistic_cal summary
loess_cal

calibration curve via loess
glm_cal

fits a calibration curve via glm or Cox proportional hazards model
predict_lowess

Get predictions from loewss fit