plotmo: Plot a model's response with varying predictor values

Description

Plot a model's response when varying one or two predictors while holding the other predictors constant. A poor man's partial dependence plot.

Usage

plotmo(object = stop("no 'object' arg"),
       type=NULL, nresponse = NA, clip = TRUE, ylim = NULL,
       degree1 = TRUE, all1=FALSE, degree2 = TRUE, all2=FALSE,
       grid.func = median, grid.levels = NULL,
       col.response = 0, cex.response = 1, pch.response = 1,
       jitter.response=0, inverse.func = NULL,
       xflip=FALSE, yflip=FALSE, swapxy=FALSE,
       trace = FALSE, nrug = 0,
       se = 0, col.shade = "lightblue", col.se = 0, lty.se = 2,
       func = NULL, col.func = "pink", lwd.func = 1, lty.func = 1,
       ngrid1 = 500, lty.degree1 = 1, col.degree1 = 1,
       type2 = "persp", ngrid2 = 20,
       col.image = grey(0:9/10), col.persp = "lightblue",
       theta = NA, phi = 30, shade = 0.5,
       do.par = TRUE, caption = NULL, main = NULL,
       xlab = "", ylab = "", cex = NULL, cex.lab = 1, ...)

Arguments

object

Model object.

type

Type parameter passed to predict. For legal values see the predict method for your object; for example, see predict.lm or

nresponse

Which column to use when predict returns multiple columns. This can be a column index or column name (which may be abbreviated, partial matching is used). Ignored when predict returns a single column.

clip

Default is TRUE, meaning plot only predicted values that are in the range of the response of the original data. Use FALSE to plot all predicted values.

ylim

Three possibilities: (i) NULL (default) all y axes have same limits (where ``y'' is actually ``z'' on degree2 plots). The limits are the min and max values of the predicted response across all plots (after applying

degree1

Index vector specifying main effect plots to include. Default is TRUE, meaning all degree1 plots (the TRUE gets recycled). Use FALSE (or 0 or NA) for no degree1<

all1

Default is FALSE. Use TRUE to plot all predictors, not just those usually selected by plotmo. See ``Which variables get plotted?'' below. The all1 argument increases the numb

degree2

Index vector specifying interaction plots to include. Default is TRUE, meaning all degree2 plots. Use FALSE for no degree2 plots.

all2

Default is FALSE. Use TRUE to plot all pairs of predictors, not just those usually selected by plotmo.

grid.func

Function applied to columns of the x matrix to fix the values of variables not on the axes. Default is median. (This argument is ignored for factors. The first level of factors is used.

grid.levels

Default is NULL. Else a list of variables and their fixed value to be used when the variable is not on the axis. Supersedes grid.func for variables in the list. Names can be abbreviated, partial matching is us

col.response

Color of response points (response sites in degree2 plots). Here ``response'' refers to the response in the original data used to build the model. Default is 0, don't plot the response. Can be a vector, for example,

cex.response

Relative size of response points. Default is 1. Applies only if col.response!=0.

pch.response

Plot character for response points. Default is 1. Applies only if col.response!=0.

jitter.response

Amount to jitter the response points (passed to jitter as the factor argument). Default 0, no jitter. A typical useful value is 1. Applies only if <

inverse.func

Function applied to the predicted response before plotting. Default is NULL, meaning do not apply a function. For example, you could use inverse.func=exp if your model formula is log(y)~x.

xflip

Default FALSE. Use TRUE to flip the direction of the x axis. This argument (and yflip and swapxy) is useful when comparing to a plot from another source and you want the axes to be the sam

yflip

Default FALSE. Use TRUE to flip the direction of the y axis of the degree2 graphs.

swapxy

Default FALSE. Use TRUE to swap the x and y axes on the degree2 graphs.

trace

Default is FALSE. Use TRUE to trace operation. Use values greater than 1 for more detailed tracing. The following arguments are for degree1 (main effect) plots

nrug

Number of points in (jittered) rug. Default is 0, no rug. Special value -1 for all, i.e., nrow(x). Otherwise a random subset of nrug points is taken.

Draw standard error bands at plus and minus se times the pointwise standard errors. Default is 0, no standard error bands. A typical value would be 2. The predict method for object must su

col.shade

Color of se shading. Default is "lightblue". Use 0 for no shading.

col.se

Color of se lines. Default is 0, no lines just shading.

lty.se

Line type of se lines. Default is 2.

func

Superimpose func(x) if func is not NULL. Default is NULL. This is useful if you are comparing the model to a known function. The func is called with a single argument which

col.func

Color of func points. Default is "pink".

lwd.func

Line width of func plot. Default is 1.

lty.func

Line type of func plot. Default is 1.

ngrid1

Number of points in degree1 plots. Default is 500.

lty.degree1

Line type of degree1 plots. Default is 1.

col.degree1

Color of degree1 plots. Default is 1. The following arguments are for degree2 plots

type2

Degree2 plot type. One of "persp" (default), "contour", or "image".

ngrid2

Grid size for degree2 plots (ngrid2 x ngrid2 points are plotted). Default is 20. Note 1: the default will often be too small for contour and image plots. Note 2: with large ngrid2

col.image

Colors of image plot. Default is grey(0:9/10). The default excludes grey(1) (white) because that is the color of clipped values, see clip.

col.persp

Color of persp surface. Default is "lightblue". Use 0 for no color.

theta

Rotation parameter for persp. Default is NA, meaning automatically rotate each graph so the highest corner is furthest away. (Use trace=TRUE to see the calculated valu

phi

Passed to persp. Default is 30.

shade

Passed to persp. Default is 0.5. The following settings are related to par().

do.par

Default is TRUE, meaning call par() as appropriate for settings such as mfrow. Use FALSE if you don't want plotmo to start a new page.

caption

Overall caption. The default is to automatically create a caption from the call and response name.

main

A vector of titles, one for each plot. Will be recycled if necessary. The default generates titles automatically. See also caption, for the overall title.

xlab

Horizontal axis label on degree1 plots (for degree2 plots the abscissa labels are always the variable names). Default is "", no label, which gives more plottable area. Use the special value NULL<

ylab

Vertical axis label. Values as for xlab.

cex

Character expansion.

cex.lab

Relative size of axis labels and text. Default 1.

...

Extra arguments are passed on to the plotting functions. What is legal here depends on type2. For persp plots, ticktype="d", nticks=2 is useful.

concept

partial dependence plot

Details

Plotmo can be used on a wide variety of regression models. It plots a degree1 plot by calling predict to predict the response when changing one variable while holding all other variables at their median values. For degree2 plots, two variables are changed while holding others at their medians. For factors, the first level is used instead of the median. You can change this value with the grid.func and grid.levels arguments.

Each graph shows only a thin slice of the data because most variables are fixed. Please be aware of that when interpreting the graph --- over-interpretation is a temptation.

The name plotmo was chosen because it is short, pronounceable as a word, yet unlikely to conflict with names in other packages or user code. Plotmo was originally part of the earth package and a few connections to that package still remain.

Limitations

NAs are not supported. To prevent confusing error messages from functions called by plotmo, it is safest to remove NAs before building your model. (However, rpart models are treated specially by plotmo, actually allowing NAs so you can use plotmo with the default arguments for rpart.)

Keep the variable names in the original model formula simple. Use temporary variables or attach rather than using $ and similar in formulas.

Plotmo assumes that the data used to build the model is still available when plotmo is called.

Which variables get plotted?

Plotmo invokes object-specific methods to select which variables to plot. The set of variables plotted for some common classes is listed below. This set may leave out pairs that you would like to see --- in that case use all2=TRUE. [object Object],[object Object],[object Object],[object Object],[object Object]

Using plotmo on various models

Here are some examples which illustrate plotmo on various objects. The models are just for illustrating plotmo and shouldn't be taken too seriously. # use a small set of variables for illustration library(earth) # for ozone1 data data(ozone1) oz <- ozone1[, c("O3", "humidity", "temp", "ibt")]

lm.fit <- lm(O3 ~ humidity + temp*ibt, data=oz) # linear model plotmo(lm.fit, se=2, col.response=2, nrug=-1)

library(mgcv) # GAM gam.fit <- gam(O3 ~ s(humidity) + s(temp) + s(ibt) + s(temp, ibt), data=oz) plotmo(gam.fit, se=2, all2=TRUE)

library(rpart) # rpart rpart.fit <- rpart(O3 ~ ., data=oz) plotmo(rpart.fit, all2=TRUE)

library(randomForest) # randomForest rf.fit <- randomForest(O3~., data=oz) plotmo(rf.fit) partialPlot(rf.fit, oz, temp) # compare partial dependence plot

library(gbm) # gbm gbm.fit <- gbm(O3~., data=oz, dist="gaussian", inter=2, n.trees=1000) plotmo(gbm.fit) plot(gbm.fit, i.var=2) # compare partial dependence plots plot(gbm.fit, i.var=c(2,3))

library(MASS) # qda lcush <- data.frame(Type=as.numeric(Cushings$Type), log(Cushings[,1:2])) lcush <- lcush[1:21,] qda.fit <- qda(Type~., data=lcush) plotmo(qda.fit, type="class", all2=TRUE, type2="contour", ngrid2=100, nlevels=2, drawlabels=FALSE, col.response=as.numeric(lcush$Type)+1, pch.response=as.character(lcush$Type)) Plotmo has to make some assumptions about the model object. If the model function did not save the call or data with the object in a standard fashion, plotmo cannot proceed and will issue an error. Object-specific methods can be usually written to deal with such issues, see the next section.

Extending plotmo

Plotmo calls the S3 methods listed below. The default methods suffice for many objects, but where necessary plotmo can be extended by writing new methods. See plotmo.gbm.R for an example. [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object] Alternatives

An alternative approach is to use partial dependence plots (e.g. The Elements of Statistical Learning Section 10.13.2). Plotmo sets the ``other'' variables to their median value, whereas in a partial dependence plot at each plotted point the effect of the other variables is averaged.

There appears to be no general-purpose R function for partial dependence plots similar to plotmo. Averaging over the sample at every point is a slow process unless the effect of averaging can be determined without actually doing the calculation. That is not the case for most models, but it is for trees. See partialPlot in the randomForest package.

Termplot is effective where applicable, but it can be used only on models with a predict method that supports type="terms", and it does not generate degree2 plots.

Common error messages

Error in match.arg(type): 'arg' should be one of ...

The message is probably issued by the predict method for your object. Set type to an appropriate value. So if you are plotting an earth object, for example, the appropriate values for type will be given in the help page for predict.earth.

Error: predicted values are out of ylim, try clip=FALSE

With clip=TRUE (the default), plotmo sets the range of the response axis of the graphs to the range of the response y used when originally building the model. When plotmo calls predict for each graph, it issues the above message (or similar) if all the predicted values are out of the range.

Depending on the model, the above approach may be wrong. For example, if we are predicting log odds, the predicted response will not be on the same scale as the original response. Plotmo does know about some special cases. For example, it knows that for some models we are predicting a probability, and it scales the axes accordingly. However, not all situations are handled. Plotmo does not know about every possible model and prediction type, and that is typically when the above message is issued. The remedy is simple: re-invoke plotmo with clip=FALSE.

Error: get.plotmo.x.default cannot get the x matrix

This and similar messages mean that plotmo cannot get the data it needs from the model object. Typically this means that class methods need to be written for the object, see ``Extending plotmo'' above (and contact the author).

Warning in model.frame.default: 'newdata' had 50 rows but vars have 31 rows

This message usually means that model.frame cannot find all the variables in the data frame created by plotmo. Make sure the variables you used to build the model are still available when you call plotmo. Try also simplifying the formula used to create the model.

Error in model.frame: invalid type (list) for variable 'x[,3]'

Plotmo can get confused by variables in formulas which use indexing, such as x[,3]. The symptom is usually a message similar to the above.

FAQ

I want to add lines or points to a plot created by plotmo. and am having trouble getting my axis scaling right. Help?

Use do.par=FALSE. With do.par=FALSE, the axis scales match the axis labels. With do.par=TRUE, plotmo restores the par parameters and axis scales to their values before plotmo was called.

The persp display is very jagged. How can I change that?

Try using clip=FALSE. The jaggedness is probably an artifact of the way persp works at the boundaries. You can also try increasing ngrid2.

The image display has white ``holes'' in it. What are those?

The ``holes'' are probably areas where the predicted response is out-of-range. Try using clip=FALSE.

Why is the default clip = TRUE?

It is a useful sanity check for plotmo to test that the predicted values are in the expected range. While not necessarily an error, predictions outside the expected range are usually something we want to know about. Also, with clip=FALSE, a few errant predictions can compress the entire y-axis making it difficult to see the shape of the other predictions.

Examples

Run this code

library(rpart)
data(kyphosis)
rpart.model <- rpart(Kyphosis~., data=kyphosis)
plotmo(rpart.model, type="prob", nresponse="present")