The `val.prob`

function is useful for validating
predicted probabilities against binary events.

Given a set of predicted probabilities `p`

or predicted log odds
`logit`

, and a vector of binary outcomes `y`

that were not
used in developing the predictions `p`

or `logit`

,
`val.prob`

computes the following indexes and statistics: Somers'
\(D_{xy}\) rank correlation between `p`

and `y`

[\(2(C-.5)\), \(C\)=ROC area], Nagelkerke-Cox-Snell-Maddala-Magee
R-squared index, Discrimination index `D`

[ (Logistic model
L.R. \(\chi^2\) - 1)/n], L.R. \(\chi^2\),
its \(P\)-value, Unreliability index \(U\), \(\chi^2\)
with 2 d.f. for testing unreliability (H0: intercept=0, slope=1), its
\(P\)-value, the quality index \(Q\), `Brier`

score (average
squared difference in `p`

and `y`

), `Intercept`

, and
`Slope`

, \(E_{max}\)=maximum absolute difference in predicted
and loess-calibrated probabilities, `Eavg`

, the average in same,
`E90`

, the 0.9 quantile of same, the Spiegelhalter \(Z\)-test for
calibration accuracy, and its two-tailed \(P\)-value. If
`pl=TRUE`

, plots fitted logistic
calibration curve and optionally a smooth nonparametric fit using
`lowess(p,y,iter=0)`

and grouped proportions vs. mean predicted
probability in group. If the predicted probabilities or logits are
constant, the statistics are returned and no plot is made.
`Eavg, Emax, E90`

were from linear logistic calibration before
rms 4.5-1.

When `group`

is present, different statistics are computed,
different graphs are made, and the object returned by `val.prob`

is
different. `group`

specifies a stratification variable.
Validations are done separately by levels of group and overall. A
`print`

method prints summary statistics and several quantiles of
predicted probabilities, and a `plot`

method plots calibration
curves with summary statistics superimposed, along with selected
quantiles of the predicted probabilities (shown as tick marks on
calibration curves). Only the `lowess`

calibration curve is
estimated. The statistics computed are the average predicted
probability, the observed proportion of events, a 1 d.f. chi-square
statistic for testing for overall mis-calibration (i.e., a test of the
observed vs. the overall average predicted probability of the event)
(`ChiSq`

), and a 2 d.f. chi-square statistic for testing
simultaneously that the intercept of a linear logistic calibration curve
is zero and the slope is one (`ChiSq2`

), average absolute
calibration error (average absolute difference between the
`lowess`

-estimated calibration curve and the line of identity,
labeled `Eavg`

), `Eavg`

divided by the difference between the
0.95 and 0.05 quantiles of predictive probabilities (`Eavg/P90`

), a
"median odds ratio", i.e., the anti-log of the median absolute
difference between predicted and calibrated predicted log odds of the
event (`Med OR`

), the C-index (ROC area), the Brier quadratic error
score (`B`

), a chi-square test of goodness of fit based on the
Brier score (`B ChiSq`

), and the Brier score computed on calibrated rather than raw
predicted probabilities (`B cal`

). The first chi-square test is a
test of overall calibration accuracy ("calibration in the large"), and
the second will also detect errors such as slope shrinkage caused by
overfitting or regression to the mean. See Cox (1970) for both of these
score tests. The goodness of fit test based on the (uncalibrated) Brier
score is due to Hilden, Habbema, and Bjerregaard (1978) and is discussed
in Spiegelhalter (1986). When `group`

is present you can also
specify sampling `weights`

(usually frequencies), to obtained
weighted calibration curves.

To get the behavior that results from a grouping variable being present
without having a grouping variable, use `group=TRUE`

. In the
`plot`

method, calibration curves are drawn and labeled by default
where they are maximally separated using the `labcurve`

function.
The following parameters do not apply when `group`

is present:
`pl`

, `smooth`

, `logistic.cal`

, `m`

, `g`

,
`cuts`

, `emax.lim`

, `legendloc`

, `riskdist`

,
`mkh`

, `connect.group`

, `connect.smooth`

. The following
parameters apply to the `plot`

method but not to `val.prob`

:
`xlab`

, `ylab`

, `lim`

, `statloc`

, `cex`

.

```
val.prob(p, y, logit, group, weights=rep(1,length(y)), normwt=FALSE,
pl=TRUE, smooth=TRUE, logistic.cal=TRUE,
xlab="Predicted Probability", ylab="Actual Probability",
lim=c(0, 1), m, g, cuts, emax.lim=c(0,1),
legendloc=lim[1] + c(0.55 * diff(lim), 0.27 * diff(lim)),
statloc=c(0,0.99), riskdist=c("predicted", "calibrated"),
cex=.7, mkh=.02,
connect.group=FALSE, connect.smooth=TRUE, g.group=4,
evaluate=100, nmin=0)
```# S3 method for val.prob
print(x, …)

# S3 method for val.prob
plot(x, xlab="Predicted Probability",
ylab="Actual Probability",
lim=c(0,1), statloc=lim, stats=1:12, cex=.5,
lwd.overall=4, quantiles=c(.05,.95), flag, …)

p

predicted probability

y

vector of binary outcomes

logit

predicted log odds of outcome. Specify either `p`

or `logit`

.

group

a grouping variable. If numeric this variable is grouped into
`g.group`

quantile groups (default is quartiles). Set `group=TRUE`

to
use the `group`

algorithm but with a single stratum for `val.prob`

.

weights

an optional numeric vector of per-observation weights (usually frequencies),
used only if `group`

is given.

normwt

set to `TRUE`

to make `weights`

sum to the number of non-missing observations.

pl

TRUE to plot calibration curves and optionally statistics

smooth

plot smooth fit to `(p,y)`

using `lowess(p,y,iter=0)`

logistic.cal

plot linear logistic calibration fit to `(p,y)`

xlab

x-axis label, default is `"Predicted Probability"`

for `val.prob`

.

ylab

y-axis label, default is `"Actual Probability"`

for
`val.prob`

.

lim

limits for both x and y axes

m

If grouped proportions are desired, average no. observations per group

g

If grouped proportions are desired, number of quantile groups

cuts

If grouped proportions are desired, actual cut points for constructing
intervals, e.g. `c(0,.1,.8,.9,1)`

or `seq(0,1,by=.2)`

emax.lim

Vector containing lowest and highest predicted probability over which to
compute `Emax`

.

legendloc

If `pl=TRUE`

, list with components `x,y`

or vector `c(x,y)`

for upper left corner
of legend for curves and points. Default is `c(.55, .27)`

scaled to
`lim`

. Use `locator(1)`

to use the mouse, `FALSE`

to suppress legend.

statloc

\(D_{xy}\), \(C\), \(R^2\), \(D\), \(U\), \(Q\), `Brier`

score, `Intercept`

, `Slope`

, and \(E_{max}\)
will be added to plot, using
`statloc`

as the upper left corner of a box (default is `c(0,.9)`

).
You can specify a list or a vector. Use `locator(1)`

for the mouse, `FALSE`

to suppress statistics.
This is plotted after the curve legends.

riskdist

Use `"calibrated"`

to plot the relative frequency distribution of
calibrated probabilities after dividing into 101 bins from `lim[1]`

to
`lim[2]`

.
Set to `"predicted"`

(the default as of rms 4.5-1) to use raw assigned risk, `FALSE`

to omit risk distribution.
Values are scaled so that highest bar is `0.15*(lim[2]-lim[1])`

.

cex

Character size for legend or for table of statistics when `group`

is given

mkh

Size of symbols for legend. Default is 0.02 (see `par()`

).

connect.group

Defaults to `FALSE`

to only represent group fractions as triangles.
Set to `TRUE`

to also connect with a solid line.

connect.smooth

Defaults to `TRUE`

to draw smoothed estimates using a dashed line.
Set to `FALSE`

to instead use dots at individual estimates.

g.group

number of quantile groups to use when `group`

is given and variable is
numeric.

evaluate

number of points at which to store the `lowess`

-calibration curve.
Default is 100. If there are more than `evaluate`

unique predicted
probabilities, `evaluate`

equally-spaced quantiles of the unique
predicted probabilities, with linearly interpolated calibrated values,
are retained for plotting (and stored in the object returned by
`val.prob`

.

nmin

applies when `group`

is given. When `nmin`

\(> 0\), `val.prob`

will not
store coordinates of smoothed calibration curves in the outer tails,
where there are fewer than `nmin`

raw observations represented in
those tails. If for example `nmin`

=50, the `plot`

function will only
plot the estimated calibration curve from \(a\) to \(b\), where there are
50 subjects with predicted probabilities \(< a\) and \(> b\).
`nmin`

is ignored when computing accuracy statistics.

x

result of `val.prob`

(with `group`

in effect)

…

optional arguments for `labcurve`

(through `plot`

). Commonly used
options are `col`

(vector of colors for the strata plus overall) and
`lty`

. Ignored for `print`

.

stats

vector of column numbers of statistical indexes to write on plot

lwd.overall

line width for plotting the overall calibration curve

quantiles

a vector listing which quantiles should be indicated on each
calibration curve using tick marks. The values in `quantiles`

can be
any number of values from the following: .01, .025, .05, .1, .25, .5, .75, .9, .95, .975, .99.
By default the 0.05 and 0.95 quantiles are indicated.

flag

a function of the matrix of statistics (rows representing groups)
returning a vector of character strings (one value for each group, including
"Overall"). `plot.val.prob`

will print this vector of character
values to the left of the statistics. The `flag`

function
can refer to columns of the matrix used as input to the function by
their names given in the description above. The default function
returns `"*"`

if either `ChiSq2`

or `B ChiSq`

is
significant at the 0.01 level and `" "`

otherwise.

`val.prob`

without `group`

returns a vector with the following named
elements: `Dxy`

, `R2`

, `D`

, `D:Chi-sq`

, `D:p`

,
`U`

, `U:Chi-sq`

, `U:p`

, `Q`

, `Brier`

,
`Intercept`

, `Slope`

, `S:z`

, `S:p`

, `Emax`

.
When `group`

is present `val.prob`

returns an object of class
`val.prob`

containing a list with summary statistics and calibration
curves for all the strata plus `"Overall"`

.

The 2 d.f. \(\chi^2\) test and `Med OR`

exclude predicted or
calibrated predicted probabilities \(\leq 0\) to zero or \(\geq 1\),
adjusting the sample size as needed.

Harrell FE, Lee KL, Mark DB (1996): Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat in Med 15:361--387.

Harrell FE, Lee KL (1987): Using logistic calibration to assess the accuracy of probability predictions (Technical Report).

Miller ME, Hui SL, Tierney WM (1991): Validation techniques for logistic regression models. Stat in Med 10:1213--1226.

Stallard N (2009): Simple tests for the external validation of mortality prediction scores. Stat in Med 28:377--388.

Harrell FE, Lee KL (1985): A comparison of the *discrimination*
of discriminant analysis and logistic regression under multivariate
normality. In Biostatistics: Statistics in Biomedical, Public Health,
and Environmental Sciences. The Bernard G. Greenberg Volume, ed. PK
Sen. New York: North-Holland, p. 333--343.

Cox DR (1970): The Analysis of Binary Data, 1st edition, section 4.4. London: Methuen.

Spiegelhalter DJ (1986):Probabilistic prediction in patient management. Stat in Med 5:421--433.

Rufibach K (2010):Use of Brier score to assess binary predictions. J Clin Epi 63:938-939

Tjur T (2009):Coefficients of determination in logistic regression models-A new proposal:The coefficient of discrimination. Am Statist 63:366--372.

# NOT RUN { # Fit logistic model on 100 observations simulated from the actual # model given by Prob(Y=1 given X1, X2, X3) = 1/(1+exp[-(-1 + 2X1)]), # where X1 is a random uniform [0,1] variable. Hence X2 and X3 are # irrelevant. After fitting a linear additive model in X1, X2, # and X3, the coefficients are used to predict Prob(Y=1) on a # separate sample of 100 observations. Note that data splitting is # an inefficient validation method unless n > 20,000. set.seed(1) n <- 200 x1 <- runif(n) x2 <- runif(n) x3 <- runif(n) logit <- 2*(x1-.5) P <- 1/(1+exp(-logit)) y <- ifelse(runif(n)<=P, 1, 0) d <- data.frame(x1,x2,x3,y) f <- lrm(y ~ x1 + x2 + x3, subset=1:100) pred.logit <- predict(f, d[101:200,]) phat <- 1/(1+exp(-pred.logit)) val.prob(phat, y[101:200], m=20, cex=.5) # subgroups of 20 obs. # Validate predictions more stringently by stratifying on whether # x1 is above or below the median v <- val.prob(phat, y[101:200], group=x1[101:200], g.group=2) v plot(v) plot(v, flag=function(stats) ifelse( stats[,'ChiSq2'] > qchisq(.95,2) | stats[,'B ChiSq'] > qchisq(.95,1), '*', ' ') ) # Stars rows of statistics in plot corresponding to significant # mis-calibration at the 0.05 level instead of the default, 0.01 plot(val.prob(phat, y[101:200], group=x1[101:200], g.group=2), col=1:3) # 3 colors (1 for overall) # Weighted calibration curves # plot(val.prob(pred, y, group=age, weights=freqs)) # }