Learn R Programming

randomForestSRC (version 3.4.1)

plot.variable.rfsrc: Plot Marginal Effect of Variables

Description

Plot the marginal effect of an x-variable on the class probability (classification), response (regression), mortality (survival), or the expected years lost (competing risk). Users can select between marginal (unadjusted, but fast) and partial plots (adjusted, but slower).

Usage

# S3 method for rfsrc
plot.variable(x, xvar.names, target,
  m.target = NULL, time, surv.type = c("mort", "rel.freq",
  "surv", "years.lost", "cif", "chf"), class.type =
  c("prob", "bayes"), partial = FALSE, oob = TRUE,
  show.plots = TRUE, plots.per.page = 4, granule = 5, sorted = TRUE,
  nvar, npts = 25, smooth.lines = FALSE, subset, ...)

Arguments

x

An object of class (rfsrc, grow), (rfsrc, synthetic), or (rfsrc, plot.variable).

xvar.names

Character vector of x-variable names to include. If not specified, all variables are used.

target

For classification, an integer or character specifying the class of interest (default is the first class). For competing risks, an integer between 1 and J indicating the event of interest, where J is the number of event types. Default is the first event type.

m.target

Character value for multivariate families specifying the target outcome. If unspecified, a default is automatically chosen.

time

(Survival only) Time point at which the predicted survival value is evaluated, depending on surv.type.

surv.type

(Survival only) Type of predicted survival value to compute. See plot.variable details.

class.type

(Classification only) Type of predicted classification value to use. See plot.variable details.

partial

Logical. If TRUE, partial dependence plots are generated.

oob

Logical. If TRUE, out-of-bag predictions are used; otherwise, in-bag predictions are used.

show.plots

Logical. If TRUE, plots are displayed on the screen.

plots.per.page

Integer controlling the number of plots displayed per page.

granule

Integer controlling the coercion of continuous variables to factors (used to generate boxplots). Larger values increase coercion.

sorted

Logical. If TRUE, variables are sorted by variable importance.

nvar

Number of variables to plot. Defaults to all available variables.

npts

Maximum number of points used when generating partial plots for continuous variables.

smooth.lines

Logical. If TRUE, applies lowess smoothing to partial plots.

subset

Vector indicating which rows of x$xvar to use. Defaults to all rows. Important: do not define subset based on the original dataset (which may have been altered due to missing data or other processing); define it relative to x$xvar.

...

Additional arguments passed to or from other methods.

Author

Hemant Ishwaran and Udaya B. Kogalur

Details

The vertical axis displays the ensemble-predicted value, while x-variables are plotted along the horizontal axis.

  1. For regression, the predicted response is plotted.

  2. For classification, the plotted value is the predicted class probability for the class specified by target, or the most probable class (Bayes rule) depending on whether class.type is set to "prob" or "bayes".

  3. For multivariate families, the prediction corresponds to the outcome specified by m.target. If this is a classification outcome, target may also be used to indicate the class of interest.

  4. For survival, the vertical axis shows the predicted value determined by surv.type, with the following options:

    • mort: Mortality (Ishwaran et al., 2008), interpreted as the expected number of events for an individual with the same covariates.

    • rel.freq: Relative frequency of mortality.

    • surv: Predicted survival probability at a specified time point (default is the median follow-up time), controlled via time.

  5. For competing risks, the vertical axis shows one of the following quantities, depending on surv.type:

    • years.lost: Expected number of life-years lost.

    • cif: Cumulative incidence function for the specified event.

    • chf: Cause-specific cumulative hazard function.

    In all competing risks settings, the event of interest is specified using target, and cif and chf are evaluated at the time point given by time.

To generate partial dependence plots, set partial = TRUE. These differ from marginal plots in that they isolate the effect of a single variable \(X\) on the predicted value by averaging over all other covariates:

$$ \tilde{f}(x) = \frac{1}{n} \sum_{i=1}^n \hat{f}(x, x_{i,o}), $$

where \(x_{i,o}\) denotes the observed values of all covariates other than \(X\) for individual \(i\), and \(\hat{f}\) is the prediction function. Generating partial plots can be computationally expensive; use a smaller value for npts to reduce the number of grid points evaluated for \(x\).

Plot display conventions:

  • For continuous variables: red points indicate partial values; dashed red lines represent an error band of two standard errors. Black dashed lines show the raw partial values. Use smooth.lines = TRUE to overlay a lowess smoothed line.

  • For discrete (factor) variables: boxplots are used, with whiskers extending approximately two standard errors from the mean.

  • Standard errors are provided only as rough indicators and should be interpreted cautiously.

Partial plots can be slow to compute. Setting npts to a small value can improve performance.

For additional flexibility and speed, consider using partial.rfsrc, which directly computes partial plot data and allows for greater customization.

References

Friedman J.H. (2001). Greedy function approximation: a gradient boosting machine, Ann. of Statist., 5:1189-1232.

Ishwaran H., Kogalur U.B. (2007). Random survival forests for R, Rnews, 7(2):25-31.

Ishwaran H., Kogalur U.B., Blackstone E.H. and Lauer M.S. (2008). Random survival forests, Ann. App. Statist., 2:841-860.

Ishwaran H., Gerds T.A., Kogalur U.B., Moore R.D., Gange S.J. and Lau B.M. (2014). Random survival forests for competing risks. Biostatistics, 15(4):757-773.

See Also

rfsrc, synthetic.rfsrc, partial.rfsrc, predict.rfsrc

Examples

Run this code
# \donttest{
## ------------------------------------------------------------
## survival/competing risk
## ------------------------------------------------------------

## survival
data(veteran, package = "randomForestSRC")
v.obj <- rfsrc(Surv(time,status)~., veteran, ntree = 100)
plot.variable(v.obj, plots.per.page = 3)
plot.variable(v.obj, plots.per.page = 2, xvar.names = c("trt", "karno", "age"))
plot.variable(v.obj, surv.type = "surv", nvar = 1, time = 200)
plot.variable(v.obj, surv.type = "surv", partial = TRUE, smooth.lines = TRUE)
plot.variable(v.obj, surv.type = "rel.freq", partial = TRUE, nvar = 2)

## example of plot.variable calling a pre-processed plot.variable object
p.v <- plot.variable(v.obj, surv.type = "surv", partial = TRUE, smooth.lines = TRUE)
plot.variable(p.v)
p.v$plots.per.page <- 1
p.v$smooth.lines <- FALSE
plot.variable(p.v)

## example using a pre-processed plot.variable to define custom plots
p.v <- plot.variable(v.obj, surv.type = "surv", partial = TRUE, show.plots = FALSE)
plotthis <- p.v$plotthis
plot(plotthis[["age"]], xlab = "age", ylab = "partial effect", type = "b")
boxplot(yhat ~ x, plotthis[["trt"]], xlab = "treatment", ylab = "partial effect")


## competing risks
data(follic, package = "randomForestSRC")
follic.obj <- rfsrc(Surv(time, status) ~ ., follic, nsplit = 3, ntree = 100)
plot.variable(follic.obj, target = 2)

## ------------------------------------------------------------
## regression
## ------------------------------------------------------------

## airquality 
airq.obj <- rfsrc(Ozone ~ ., data = airquality)
plot.variable(airq.obj, partial = TRUE, smooth.lines = TRUE)
plot.variable(airq.obj, partial = TRUE, subset = airq.obj$xvar$Solar.R < 200)

## motor trend cars
mtcars.obj <- rfsrc(mpg ~ ., data = mtcars)
plot.variable(mtcars.obj, partial = TRUE, smooth.lines = TRUE)

## ------------------------------------------------------------
## classification
## ------------------------------------------------------------

## iris
iris.obj <- rfsrc(Species ~., data = iris)
plot.variable(iris.obj, partial = TRUE)

## motor trend cars: predict number of carburetors
mtcars2 <- mtcars
mtcars2$carb <- factor(mtcars2$carb,
   labels = paste("carb", sort(unique(mtcars$carb))))
mtcars2.obj <- rfsrc(carb ~ ., data = mtcars2)
plot.variable(mtcars2.obj, partial = TRUE)

## ------------------------------------------------------------
## multivariate regression
## ------------------------------------------------------------
mtcars.mreg <- rfsrc(Multivar(mpg, cyl) ~., data = mtcars)
plot.variable(mtcars.mreg, m.target = "mpg", partial = TRUE, nvar = 1)
plot.variable(mtcars.mreg, m.target = "cyl", partial = TRUE, nvar = 1)


## ------------------------------------------------------------
## multivariate mixed outcomes
## ------------------------------------------------------------
mtcars2 <- mtcars
mtcars2$carb <- factor(mtcars2$carb)
mtcars2$cyl <- factor(mtcars2$cyl)
mtcars.mix <- rfsrc(Multivar(carb, mpg, cyl) ~ ., data = mtcars2)
plot.variable(mtcars.mix, m.target = "cyl", target = "4", partial = TRUE, nvar = 1)
plot.variable(mtcars.mix, m.target = "cyl", target = 2, partial = TRUE, nvar = 1)


# }

Run the code above in your browser using DataLab