Learn R Programming

ggRandomForests (version 1.2.1)

partial_surface_data: Cached plot.variable objects for examples, diagnostics and vignettes. Data sets storing plot.variable objects corresponding to training data according to the following naming convention:
  • partial_Boston_surf- from a randomForestS[R]C for theBostonhousing data set (MASSpackage).
  • partial_pbc_surf- from a randomForest[S]RC for thepbcdata set (randomForestSRCpackage)
  • partial_pbc_time- from a randomForest[S]RC for thepbcdata set (randomForestSRCpackage)

Description

Cached plot.variable objects for examples, diagnostics and vignettes. Data sets storing plot.variable objects corresponding to training data according to the following naming convention:
  • partial_Boston_surf- from a randomForestS[R]C for theBostonhousing data set (MASSpackage).
  • partial_pbc_surf- from a randomForest[S]RC for thepbcdata set (randomForestSRCpackage)
  • partial_pbc_time- from a randomForest[S]RC for thepbcdata set (randomForestSRCpackage)

Arguments

format

list of plot.variable objects

Details

Constructing partial plot data with the randomForestsSRC::plot.variable function are computationally expensive. We cache plot.variable objects to improve the ggRandomForests examples, diagnostics and vignettes run times. (see rfsrc_cache_datasets to rebuild a complete set of these data sets.)

For each data set listed, we build a rfsrc (see rfsrc_data), then calculate the partial plot data with plot.variable function, setting partial=TRUE. Each data set is built with the rfsrc_cache_datasets with the randomForestSRC version listed in the ggRandomForests DESCRIPTION file.

  • partial_Boston- TheBostonhousing values in suburbs of Boston from theMASSpackage. Build a regression random forest for predicting medv (median home values) on 13 covariates and 506 observations.
  • partial_pbc- Thepbcdata from the Mayo Clinic trial in primary biliary cirrhosis (PBC) of the liver conducted between 1974 and 1984. A total of 424 PBC patients, referred to Mayo Clinic during that ten-year interval, met eligibility criteria for the randomized placebo controlled trial of the drug D-penicillamine. 312 cases participated in the randomized trial and contain largely complete data. Data from therandomForestSRCpackage. Build a survival random forest for time-to-event death data with 17 covariates and 312 observations (remaining 106 observations are held out).

References

#--------------------- randomForestSRC ---------------------

Ishwaran H. and Kogalur U.B. (2014). Random Forests for Survival, Regression and Classification (RF-SRC), R package version 1.5.5.

Ishwaran H. and Kogalur U.B. (2007). Random survival forests for R. R News 7(2), 25-31.

Ishwaran H., Kogalur U.B., Blackstone E.H. and Lauer M.S. (2008). Random survival forests. Ann. Appl. Statist. 2(3), 841-860.

#--------------------- Boston data set ---------------------

Belsley, D.A., E. Kuh, and R.E. Welsch. 1980. Regression Diagnostics. Identifying Influential Data and Sources of Collinearity. New York: Wiley. Harrison, D., and D.L. Rubinfeld. 1978. "Hedonic Prices and the Demand for Clean Air." J. Environ. Economics and Management 5: 81-102. #--------------------- pbc data set ---------------------

Flemming T.R and Harrington D.P., (1991) Counting Processes and Survival Analysis. New York: Wiley.

T Therneau and P Grambsch (2000), Modeling Survival Data: Extending the Cox Model, Springer-Verlag, New York. ISBN: 0-387-98784-3.

See Also

Boston pbc plot.variable rfsrc_data rfsrc_cache_datasets gg_partial plot.gg_partial

Examples

Run this code
#---------------------------------------------------------------------
# MASS::Boston data - regression random forest 
#---------------------------------------------------------------------
# load the rfsrc object from the cached data
data(rfsrc_Boston, package="ggRandomForests")

# The plot.variable call
partial_Boston <- plot.variable(rfsrc_Boston,
                                partial=TRUE, show.plots = FALSE )

# plot the forest partial plots
gg_dta <- gg_partial(partial_Boston)
plot(gg_dta, panel=TRUE)

#---------------------------------------------------------------------
# randomForestSRC::pbc data - survival random forest
#---------------------------------------------------------------------
# load the rfsrc object from the cached data
data(rfsrc_pbc, package="ggRandomForests")

# Restrict the time of interest to less than 5 years.
time_pts <- rfsrc_pbc$time.interest[which(rfsrc_pbc$time.interest<=5)]

# Find the 50 points in time, evenly space along the distribution of 
# event times for a series of partial dependence curves
time_cts <-quantile_pts(time_pts, groups = 50)

# Generate the gg_partial_coplot data object
system.time(partial_pbc_time <- lapply(time_cts, function(ct){
   plot.variable(rfsrc_pbc, xvar = "bili", time = ct,
                 npts = 50, show.plots = FALSE, 
                 partial = TRUE, surv.type="surv")
   }))
#     user   system  elapsed 
# 2561.313   81.446 2641.707 

# Find the quantile points to create 50 cut points
alb_partial_pts <-quantile_pts(rfsrc_pbc$xvar$albumin, groups = 50)

system.time(partial_pbc_surf <- lapply(alb_partial_pts, function(ct){
  rfsrc_pbc$xvar$albumin <- ct
  plot.variable(rfsrc_pbc, xvar = "bili", time = 1,
                npts = 50, show.plots = FALSE, 
                partial = TRUE, surv.type="surv")
  }))
# user   system  elapsed 
# 2547.482   91.978 2671.870

Run the code above in your browser using DataLab