plot.Predict: Plot Effects of Variables Estimated by a Regression Model Fit

Description

Uses lattice graphics to plot the effect of one or two predictors on the linear predictor or X beta scale, or on some transformation of that scale. The first argument specifies the result of the Predict function. The predictor is always plotted in its original coding. plot.Predict uses the xYplot function unless formula is omitted and the x-axis variable is a factor, in which case it reverses the x- and y-axes and uses the Dotplot function.

If data is given, a rug plot is drawn showing the location/density of data values for the $x$-axis variable. If there is a groups (superposition) variable that generated separate curves, the data density specific to each class of points is shown. This assumes that the second variable was a factor variable. The rug plots are drawn by scat1d.

To plot effects instead of estimates (e.g., treatment differences as a function of interacting factors) see contrast.rms and summary.rms.

pantext creates a lattice panel function for including text such as that produced by print.anova.rms inside a panel or in a base graphic.

Usage

## S3 method for class 'Predict':
plot(x, formula, groups=NULL, subset,
     xlim, ylim, xlab, ylab, 
     data=NULL, col.fill=gray(seq(.95, .75, length=5)),
     adj.subtitle, cex.adj, perim=NULL, digits=4, nlevels=3,
     addpanel, ...)
pantext(object, x, y, cex=.5, adj=0, fontfamily="Courier", lattice=TRUE)

Arguments

a data frame created by Predict, or for pantext the x-coordinate for text

formula

the right hand side of a lattice formula reference variables in data frame x. You may not specify formula if you varied multiple predictors separately when calling Predict. Otherwise, when f

groups

an optional name of one of the variables in x that is to be used as a grouping (superpositioning) variable. Note that groups does not contain the groups data as is customary in lattice; it is only a single cha

subset

a subsetting expression for restricting the rows of x that are used in plotting. For example, predictions may have been requested for males and females but one wants to plot only females.

xlim

This parameter is seldom used, as limits are usually controlled with Predict. One reason to use xlim is to plot a factor variable on the x-axis that was created with the cut2 function with the lev

ylim

Range for plotting on response variable axis. Computed by default.

xlab

Label for x-axis. Default is one given to asis, rcs, etc., which may have been the "label" attribute of the variable.

ylab

Label for y-axis. If fun is not given, default is "log Odds" for lrm, "log Relative Hazard" for cph, name of the response variable for ols, TRUE or

data

a data frame containing the original raw data on which the regression model were based, or at least containing the $x$-axis and grouping variable. If data is present and contains the needed variables, the original data are added to the

col.fill

a vector of colors used to fill confidence bands for successive superposed groups. Default is inceasingly dark gray scale.

adj.subtitle

Set to FALSE to suppress subtitling the graph with the list of settings of non-graphed adjustment values.

cex.adj

cex parameter for size of adjustment settings in subtitles. Default is 0.75 times par("cex").

perim

perim specifies a function having two arguments. The first is the vector of values of the first variable that is about to be plotted on the x-axis. The second argument is the single value of the variable representing different curves, for t

digits

Controls how numeric variables used for panel labels are formatted. The default is 4 significant digits.

nlevels

when groups and formula are not specified, if any panel variable has nlevels or fewer values, that variable is converted to a groups (superpositioning) variable. Set nlevels=0 to prev

addpanel

an additional panel function to call along with panel functions used for xYplot and Dotplot displays

...

extra arguments to pass to xYplot or Dotplot. Some useful ones are label.curves and abline. Set label.curves to FALSE to suppress labeling of separate curves. Default is

object

an object having a print method

y-coordinate for placing text in a lattice panel or on a base graphics plot

cex

character expansion size for pantext

adj

text justification. Default is left justified.

fontfamily

font family for pantext. Default is "Courier" which will line up columns of a table.

lattice

set to FALSE to use text instead of ltext in the function generated by pantext, to use base graphics

Value

a lattice object ready to print for rendering.

Details

When a groups (superpositioning) variable was used, you can issue the command Key(...) after printing the result of plot.Predict, to draw a key for the groups.

Examples

Run this code

n <- 1000    # define sample size
set.seed(17) # so can reproduce the results
age            <- rnorm(n, 50, 10)
blood.pressure <- rnorm(n, 120, 15)
cholesterol    <- rnorm(n, 200, 25)
sex            <- factor(sample(c('female','male'), n,TRUE))
label(age)            <- 'Age'      # label is in Hmisc
label(cholesterol)    <- 'Total Cholesterol'
label(blood.pressure) <- 'Systolic Blood Pressure'
label(sex)            <- 'Sex'
units(cholesterol)    <- 'mg/dl'   # uses units.default in Hmisc
units(blood.pressure) <- 'mmHg'

# Specify population model for log odds that Y=1
L <- .4*(sex=='male') + .045*(age-50) +
  (log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male'))
# Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)]
y <- ifelse(runif(n) < plogis(L), 1, 0)

ddist <- datadist(age, blood.pressure, cholesterol, sex)
options(datadist='ddist')

fit <- lrm(y ~ blood.pressure + sex * (age + rcs(cholesterol,4)),
               x=TRUE, y=TRUE)
plot(Predict(fit))       # Plot effects of all 4 predictors
plot(Predict(fit), data=llist(blood.pressure,age))
                         # rug plot for two of the predictors

p <- Predict(fit, name=c('age','cholesterol'))   # Make 2 plots
plot(p)

p <- Predict(fit, age=seq(20,80,length=100), sex=., conf.int=FALSE)
                         # Plot relationship between age and log
                         # odds, separate curve for each sex,
plot(p)                  # no C.I.

p <- Predict(fit, age=., sex=.)
plot(p, label.curves=FALSE, data=llist(age,sex))
                         # use label.curves=list(keys=c('a','b'))'
                         # to use 1-letter abbreviations
                         # data= allows rug plots (1-dimensional scatterplots)
                         # on each sex's curve, with sex-
                         # specific density of age
                         # If data were in data frame could have used that
p <- Predict(fit, age=seq(20,80,length=100), sex='male', fun=plogis)
                         # works if datadist not used
plot(p, ylab=expression(hat(P)))
                         # plot predicted probability in place of log odds

per <- function(x, y) x >= 30
plot(p, perim=per)       # suppress output for age < 30 but leave scale alone

# Take charge of the plot setup by specifying a lattice formula
p <- Predict(fit, age=., blood.pressure=c(120,140,160),
             cholesterol=c(180,200,215), sex=.)
plot(p, ~ age | blood.pressure*cholesterol, subset=sex=='male')
plot(p, ~ age | cholesterol*blood.pressure, subset=sex=='female')
plot(p, ~ blood.pressure|cholesterol*round(age,-1), subset=sex=='male')
plot(p)

# Plot the age effect as an odds ratio
# comparing the age shown on the x-axis to age=30 years

ddist$limits$age[2] <- 30    # make 30 the reference value for age
# Could also do: ddist$limits["Adjust to","age"] <- 30
fit <- update(fit)   # make new reference value take effect
p <- Predict(fit, age=., ref.zero=TRUE, fun=exp)
plot(p, ylab='Age=x:Age=30 Odds Ratio',
     abline=list(list(h=1, lty=2, col=2), list(v=30, lty=2, col=2)))


# Plots for a parametric survival model
n <- 1000
set.seed(731)
age <- 50 + 12*rnorm(n)
label(age) <- "Age"
sex <- factor(sample(c('Male','Female'), n, 
              rep=TRUE, prob=c(.6, .4)))
cens <- 15*runif(n)
h <- .02*exp(.04*(age-50)+.8*(sex=='Female'))
t <- -log(runif(n))/h
label(t) <- 'Follow-up Time'
e <- ifelse(t<=cens,1,0)
t <- pmin(t, cens)
units(t) <- "Year"
ddist <- datadist(age, sex)
Srv <- Surv(t,e)


# Fit log-normal survival model and plot median survival time vs. age
f <- psm(Surv(t, e) ~ rcs(age), dist='lognormal')
med <- Quantile(f)       # Creates function to compute quantiles
                         # (median by default)
p <- Predict(f, age=., fun=function(x) med(lp=x))
plot(p, ylab="Median Survival Time")
# Note: confidence intervals from this method are approximate since
# they don't take into account estimation of scale parameter


# Fit an ols model to log(y) and plot the relationship between x1
# and the predicted mean(y) on the original scale without assuming
# normality of residuals; use the smearing estimator
# See help file for rbind.Predict for a method of showing two
# types of confidence intervals simultaneously.
set.seed(1)
x1 <- runif(300)
x2 <- runif(300)
ddist <- datadist(x1,x2)
y  <- exp(x1+x2-1+rnorm(300))
f <- ols(log(y) ~ pol(x1,2)+x2)
r <- resid(f)
smean <- function(yhat)smearingEst(yhat, exp, res, statistic='mean')
formals(smean) <- list(yhat=numeric(0), res=r[!is.na(r)])
#smean$res <- r[!is.na(r)]   # define default res argument to function
plot(Predict(f, x1=., fun=smean), ylab='Predicted Mean on y-scale')
options(datadist=NULL)

# Example in which separate curves are shown for 4 income values
# For each curve the estimated percentage of voters voting for
# the democratic party is plotted against the percent of voters
# who graduated from college.  Data are county-level percents.

incomes <- seq(22900, 32800, length=4)  
# equally spaced to outer quintiles
p <- Predict(f, college=., income=incomes, conf.int=FALSE)
plot(p, xlim=c(0,35), ylim=c(30,55))

# Erase end portions of each curve where there are fewer than 10 counties having
# percent of college graduates to the left of the x-coordinate being plotted,
# for the subset of counties having median family income with 1650
# of the target income for the curve

show.pts <- function(college.pts, income.pt) {
  s <- abs(income - income.pt) < 1650  #assumes income known to top frame
  x <- college[s]
  x <- sort(x[!is.na(x)])
  n <- length(x)
  low <- x[10]; high <- x[n-9]
  college.pts >= low & college.pts <= high
}

plot(p, xlim=c(0,35), ylim=c(30,55), perim=show.pts)

Run the code above in your browser using DataLab