Predict: Compute Predicted Values and Confidence Limits

Description

Predict allows the user to easily specify which predictors are to vary. When the vector of values over which a predictor should vary is a single value that is NA (usually specified as a single dot), the range will be all levels of a categorical predictor or equally-spaced points between the datadist "Low:prediction" and "High:prediction" values for the variable (datadist by default uses the 10th smallest and 10th largest predictor values in the dataset). Predicted values are the linear predictor (X beta), a user-specified transformation of that scale, or estimated probability of surviving past a fixed single time point given the linear predictor. Predict is usually used for plotting predicted values but there is also a print method.

There is a plot method for Predict objects that makes it easy to show predicted values and confidence bands.

The rbind method for Predict objects allows you to create separate sets of predictions under different situations and to combine them into one set for feeding to plot.Predict. For example you might want to plot confidence intervals for means and for individuals using ols, and have the two types of confidence bands be superposed onto one plot or placed into two panels. Another use for rbind is to combine predictions from quantile regression models that predicted three different quantiles.

Usage

Predict(x, ..., fun,
        type = c("predictions", "model.frame", "x"),
        np = 200, conf.int = 0.95,
        conf.type = c("mean", "individual"),
        adj.zero = FALSE, ref.zero = FALSE,
        non.slopes, time = NULL, loglog = FALSE, digits=4, name)
## S3 method for class 'Predict':
print(x, \dots)
## S3 method for class 'Predict':
rbind(\dots)

Arguments

an rms fit object, or for print the result of Predict. options(datadist="d") must have been specified (where d was created by datadist), or it must have been in effect when

...

One or more variables to vary, or single-valued adjustment values. Specify x=. to use the default display range, or any range you choose (e.g. seq(0,100,by=2),c(2,3,7,14)). The default list of values for which predictions are

fun

an optional transformation of the linear predictor

type

defaults to providing predictions. Set to "model.frame" to return a data frame of predictor settings used. Set to "x" to return the corresponding design matrix constructed from the predictor settings.

the number of equally-spaced points computed for continuous predictors that vary, i.e., when the specified value is . or NA

conf.int

confidence level. Default is 0.95. Specify FALSE to suppress.

conf.type

type of confidence interval. Default is "mean" which applies to all models. For models containing a residual variance (e.g, ols), you can specify conf.type="individual" instead, to obtain limits on the predicted

adj.zero

Set to TRUE to adjust all non-plotted variables to 0 (or reference cell for categorical variables) and to omit intercept(s) from consideration. Default is FALSE.

ref.zero

Set to TRUE to subtract a constant from $X\beta$ before plotting so that the reference value of the x-variable yields y=0. This is done before applying function fun.

non.slopes

This is only useful in a multiple intercept model such as the ordinal logistic model. There to use to second of three intercepts, for example, specify non.slopes=c(0,1,0). The default is non.slopes=rep(0,k) if adj.zero=T

time

Specify a single time u to cause function survest to be invoked to plot the probability of surviving until time u when the fit is from cph or psm.

loglog

Specify loglog=TRUE to plot log[-log(survival)] instead of survival, when time is given.

digits

Controls how ``adjust-to'' values are plotted. The default is 4 significant digits.

name

Instead of specifying the variables to vary in the variables (...) list, you can specify one or more variables by specifying a vector of character string variable names in the name argument. Using this mode you cannot specify

Value

a data frame containing all model predictors and the computed values yhat, lower, upper, the latter two if confidence intervals were requested. The data frame has an additional class "Predict". If name is specified or no predictors are specified in ..., the resulting data frame has an additional variable called .predictor. specifying which predictor is currently being varied. .predictor. is handy for use as a paneling variable in lattice or ggplot2 graphics.

Details

When there are no intercepts in the fitted model, plot subtracts adjustment values from each factor while computing variances for confidence limits.

Specifying time will not work for Cox models with time-dependent covariables. Use survest or survfit for that purpose.

Examples

Run this code

n <- 1000    # define sample size
set.seed(17) # so can reproduce the results
age            <- rnorm(n, 50, 10)
blood.pressure <- rnorm(n, 120, 15)
cholesterol    <- rnorm(n, 200, 25)
sex            <- factor(sample(c('female','male'), n,TRUE))
label(age)            <- 'Age'      # label is in Hmisc
label(cholesterol)    <- 'Total Cholesterol'
label(blood.pressure) <- 'Systolic Blood Pressure'
label(sex)            <- 'Sex'
units(cholesterol)    <- 'mg/dl'   # uses units.default in Hmisc
units(blood.pressure) <- 'mmHg'

# Specify population model for log odds that Y=1
L <- .4*(sex=='male') + .045*(age-50) +
  (log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male'))
# Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)]
y <- ifelse(runif(n) < plogis(L), 1, 0)

ddist <- datadist(age, blood.pressure, cholesterol, sex)
options(datadist='ddist')

fit <- lrm(y ~ blood.pressure + sex * (age + rcs(cholesterol,4)))
Predict(fit, age=., cholesterol=., np=4)
Predict(fit, age=seq(20,80,by=10), sex=., conf.int=FALSE)
Predict(fit, age=seq(20,80,by=10), sex='male')  # works if datadist not used

ddist$limits$age[2] <- 30    # make 30 the reference value for age
# Could also do: ddist$limits["Adjust to","age"] <- 30
fit <- update(fit)   # make new reference value take effect
Predict(fit, age=., ref.zero=TRUE, fun=exp)

# Make two curves, and plot the predicted curves as two trellis panels
w <- Predict(fit, age=., sex=.)
require(lattice)
xyplot(yhat ~ age | sex, data=w, type='l')
# To add confidence bands we need to use the Hmisc xYplot function in
# place of xyplot
xYplot(Cbind(yhat,lower,upper) ~ age | sex, data=w, 
       method='filled bands', type='l', col.fill=gray(.95))
# If non-displayed variables were in the model, add a subtitle to show
# their settings using title(sub=paste('Adjusted to',attr(w,'info')$adjust),adj=0)
# Easier: feed w into plot.Predict
# Predictions form a parametric survival model
n <- 1000
set.seed(731)
age <- 50 + 12*rnorm(n)
label(age) <- "Age"
sex <- factor(sample(c('Male','Female'), n, 
              rep=TRUE, prob=c(.6, .4)))
cens <- 15*runif(n)
h <- .02*exp(.04*(age-50)+.8*(sex=='Female'))
t <- -log(runif(n))/h
label(t) <- 'Follow-up Time'
e <- ifelse(t<=cens,1,0)
t <- pmin(t, cens)
units(t) <- "Year"
ddist <- datadist(age, sex)
Srv <- Surv(t,e)

# Fit log-normal survival model and plot median survival time vs. age
f <- psm(Surv(t, e) ~ rcs(age), dist=if(.R.)'lognormal' else 'gaussian')
med <- Quantile(f)       # Creates function to compute quantiles
                         # (median by default)
Predict(f, age=., fun=function(x)med(lp=x))
# Note: This works because med() expects the linear predictor (X*beta)
#       as an argument.  Would not work if use 
#       ref.zero=TRUE or adj.zero=TRUE.
# Also, confidence intervals from this method are approximate since
# they don't take into account estimation of scale parameter

# Fit an ols model to log(y) and plot the relationship between x1
# and the predicted mean(y) on the original scale without assuming
# normality of residuals; use the smearing estimator.  Before doing
# that, show confidence intervals for mean and individual log(y)
set.seed(1)
x1 <- runif(300)
x2 <- runif(300)
ddist <- datadist(x1,x2); options(datadist='ddist')
y  <- exp(x1+ x2 - 1 + rnorm(300))
f  <- ols(log(y) ~ pol(x1,2) + x2)
p1 <- Predict(f, x1=., conf.type='mean')
p2 <- Predict(f, x1=., conf.type='individual')
p  <- rbind(mean=p1, individual=p2)
plot(p, label.curve=FALSE)   # uses superposition
plot(p, ~x1 | .set.)         # 2 panels

r <- resid(f)
smean <- function(yhat)smearingEst(yhat, exp, res, statistic='mean')
formals(smean) <- list(yhat=numeric(0), res=r[!is.na(r)])
#smean$res <- r[!is.na(r)]   # define default res argument to function
Predict(f, x1=., fun=smean)
options(datadist=NULL)

Run the code above in your browser using DataLab