validate.lrm: Resampling Validation of a Logistic Model

Description

The validate function when used on an object created by lrm does resampling validation of a logistic regression model, with or without backward step-down variable deletion. It provides bias-corrected Somers' $D_{xy}$ rank correlation, R-squared index, the intercept and slope of an overall logistic calibration equation, the maximum absolute difference in predicted and calibrated probabilities $E_{max}$, the discrimination index $D$ (model L.R. $(\chi^2 - 1)/n$), the unreliability index $U$ = difference in -2 log likelihood between un-calibrated $X\beta$ and $X\beta$ with overall intercept and slope calibrated to test sample / n, the overall quality index (logarithmic probability score) $Q = D - U$, and the Brier or quadratic probability score, $B$ (the last 3 are not computed for ordinal models). The corrected slope can be thought of as shrinkage factor that takes into account overfitting.

Usage

# fit <- lrm(formula=response ~ terms, x=TRUE, y=TRUE)
## S3 method for class 'lrm':
validate(fit, method="boot", B=40,
         bw=FALSE, rule="aic", type="residual", sls=0.05, aics=0, 
         pr=FALSE,  kint, Dxy.method=if(k==1) 'somers2' else 'lrm',
         emax.lim=c(0,1), ...)

Arguments

fit

a fit derived by lrm. The options x=TRUE and y=TRUE must have been specified.

method

rule

type

sls

aics

see validate and predab.resample

kint

In the case of an ordinal model, specify which intercept to validate. Default is the middle intercept.

Dxy.method

"lrm" to use lrms computation of $D_{xy}$ correlation, which rounds predicted probabilities to nearest .002. Use Dxy.method="somers2" (the default) to instead use the more accurate but slower somers2 f

emax.lim

range of predicted probabilities over which to compute the maximum error. Default is entire range.

...

other arguments to pass to lrm.fit (now only maxit and tol are allowed) and to predab.resample (note especially the group, cluster, and subset parameters)

Value

a matrix with rows corresponding to $D_{xy}$, $R^2$, Intercept, Slope, $E_{max}$, $D$, $U$, $Q$, amd $B$, and columns for the original index, resample estimates, indexes applied to the whole or omitted sample using the model derived from the resample, average optimism, corrected index, and number of successful re-samples. For ordinal models, $U, Q, B$ to not appear.

Side Effects

prints a summary, and optionally statistics for each re-fit

concept

logistic regression model
model validation
predictive accuracy
bootstrap

Details

If the original fit was created using penalized maximum likelihood estimation, the same penalty.matrix used with the original fit are used during validation.

References

Miller ME, Hui SL, Tierney WM (1991): Validation techniques for logistic regression models. Stat in Med 10:1213--1226.

Harrell FE, Lee KL (1985): A comparison of the discrimination of discriminant analysis and logistic regression under multivariate normality. In Biostatistics: Statistics in Biomedical, Public Health, and Environmental Sciences. The Bernard G. Greenberg Volume, ed. PK Sen. New York: North-Holland, p. 333--343.

Examples

Run this code

n <- 1000    # define sample size
age            <- rnorm(n, 50, 10)
blood.pressure <- rnorm(n, 120, 15)
cholesterol    <- rnorm(n, 200, 25)
sex            <- factor(sample(c('female','male'), n,TRUE))


# Specify population model for log odds that Y=1
L <- .4*(sex=='male') + .045*(age-50) +
  (log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male'))
# Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)]
y <- ifelse(runif(n) < plogis(L), 1, 0)


f <- lrm(y ~ sex*rcs(cholesterol)+pol(age,2)+blood.pressure, x=TRUE, y=TRUE)
#Validate full model fit
validate(f, B=10)              # normally B=150
validate(f, B=10, group=y)  
# two-sample validation: make resamples have same numbers of
# successes and failures as original sample


#Validate stepwise model with typical (not so good) stopping rule
validate(f, B=10, bw=TRUE, rule="p", sls=.1, type="individual")


#Fit a continuation ratio model and validate it for the predicted
#probability that y=0
u <- cr.setup(y)
Y <- u$y
cohort <- u$cohort
attach(mydataframe[u$subs,])
f <- lrm(Y ~ cohort+rcs(age,4)*sex, penalty=list(interaction=2))
validate(f, cluster=u$subs, subset=cohort=='all') 
#see predab.resample for cluster and subset

Run the code above in your browser using DataLab