# validate.lrm

##### Resampling Validation of a Logistic or Ordinal Regression Model

The `validate`

function when used on an object created by
`lrm`

or `orm`

does resampling validation of a logistic
regression model,
with or without backward step-down variable deletion. It provides
bias-corrected Somers' \(D_{xy}\) rank correlation, R-squared index,
the intercept and slope of an overall logistic calibration equation, the
maximum absolute difference in predicted and calibrated probabilities
\(E_{max}\), the discrimination index \(D\) (model L.R. \((\chi^2
- 1)/n\)), the unreliability index \(U\) =
difference in -2 log likelihood between un-calibrated \(X\beta\) and \(X\beta\) with overall intercept and slope
calibrated to test sample / n, the overall quality index (logarithmic
probability score) \(Q = D - U\), and the Brier or quadratic
probability score, \(B\) (the last 3 are not computed for ordinal
models), the \(g\)-index, and `gp`

, the \(g\)-index on the
probability scale. The corrected slope can be thought of as shrinkage
factor that takes into account overfitting. For `orm`

fits, a
subset of the above indexes is provided, Spearman's \(\rho\) is
substituted for \(D_{xy}\), and a new index is reported: `pdm`

, the mean
absolute difference between 0.5 and the predicted probability that
\(Y\geq\) the marginal median of \(Y\).

- Keywords
- models, regression

##### Usage

```
# fit <- lrm(formula=response ~ terms, x=TRUE, y=TRUE) or orm
# S3 method for lrm
validate(fit, method="boot", B=40,
bw=FALSE, rule="aic", type="residual", sls=0.05, aics=0,
force=NULL, estimates=TRUE,
pr=FALSE, kint, Dxy.method=if(k==1) 'somers2' else 'lrm',
emax.lim=c(0,1), …)
# S3 method for orm
validate(fit, method="boot", B=40, bw=FALSE, rule="aic",
type="residual", sls=.05, aics=0, force=NULL, estimates=TRUE,
pr=FALSE, ...)
```

##### Arguments

- fit
a fit derived by

`lrm`

or`orm`

. The options`x=TRUE`

and`y=TRUE`

must have been specified.- method,B,bw,rule,type,sls,aics,force,estimates,pr
see

`validate`

and`predab.resample`

- kint
In the case of an ordinal model, specify which intercept to validate. Default is the middle intercept. For

`validate.orm`

, intercept-specific quantities are not validated so this does not matter.- Dxy.method
`"lrm"`

to use`lrm`

s computation of \(D_{xy}\) correlation, which rounds predicted probabilities to nearest .002. Use`Dxy.method="somers2"`

(the default) to instead use the more accurate but slower`somers2`

function. This will matter most when the model is extremely predictive. The default is`"lrm"`

for ordinal models, since`somers2`

only handles binary response variables.- emax.lim
range of predicted probabilities over which to compute the maximum error. Default is entire range.

- …
other arguments to pass to

`lrm.fit`

(now only`maxit`

and`tol`

are allowed) and to`predab.resample`

(note especially the`group`

,`cluster`

, and`subset`

parameters)

##### Details

If the original fit was created using penalized maximum likelihood estimation,
the same `penalty.matrix`

used with the original
fit are used during validation.

##### Value

a matrix with rows corresponding to \(D_{xy}\),
\(R^2\), `Intercept`

, `Slope`

, \(E_{max}\), \(D\),
\(U\), \(Q\), \(B\), \(g\), \(gp\), and
columns for the original index, resample estimates, indexes applied to
the whole or omitted sample using the model derived from the resample,
average optimism, corrected index, and number of successful re-samples.
For `validate.orm`

not all columns are provided, Spearman's rho
is returned instead of \(D_{xy}\), and `pdm`

is reported.

##### Side Effects

prints a summary, and optionally statistics for each re-fit

##### References

Miller ME, Hui SL, Tierney WM (1991): Validation techniques for logistic regression models. Stat in Med 10:1213--1226.

Harrell FE, Lee KL (1985): A comparison of the
*discrimination*
of discriminant analysis and logistic regression under multivariate
normality. In Biostatistics: Statistics in Biomedical, Public Health,
and Environmental Sciences. The Bernard G. Greenberg Volume, ed. PK
Sen. New York: North-Holland, p. 333--343.

##### See Also

`predab.resample`

, `fastbw`

, `lrm`

,
`rms`

, `rms.trans`

, `calibrate`

,
`somers2`

, `cr.setup`

,
`gIndex`

, `orm`

##### Examples

```
# NOT RUN {
n <- 1000 # define sample size
age <- rnorm(n, 50, 10)
blood.pressure <- rnorm(n, 120, 15)
cholesterol <- rnorm(n, 200, 25)
sex <- factor(sample(c('female','male'), n,TRUE))
# Specify population model for log odds that Y=1
L <- .4*(sex=='male') + .045*(age-50) +
(log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male'))
# Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)]
y <- ifelse(runif(n) < plogis(L), 1, 0)
f <- lrm(y ~ sex*rcs(cholesterol)+pol(age,2)+blood.pressure, x=TRUE, y=TRUE)
#Validate full model fit
validate(f, B=10) # normally B=300
validate(f, B=10, group=y)
# two-sample validation: make resamples have same numbers of
# successes and failures as original sample
#Validate stepwise model with typical (not so good) stopping rule
validate(f, B=10, bw=TRUE, rule="p", sls=.1, type="individual")
# }
# NOT RUN {
#Fit a continuation ratio model and validate it for the predicted
#probability that y=0
u <- cr.setup(y)
Y <- u$y
cohort <- u$cohort
attach(mydataframe[u$subs,])
f <- lrm(Y ~ cohort+rcs(age,4)*sex, penalty=list(interaction=2))
validate(f, cluster=u$subs, subset=cohort=='all')
#see predab.resample for cluster and subset
# }
```

*Documentation reproduced from package rms, version 5.1-4, License: GPL (>= 2)*