Learn R Programming

ggRandomForests (version 1.1.3)

partial_Boston_surf: Cached randomForestSRC::plot.variable objects for examples, diagnostics and vignettes.

Description

Data sets storing randomForestSRC::plot.variable objects corresponding to training data according to the following naming convention:
  • partial_Boston_surf- from a randomForestS[R]C for theBostonhousing data set (MASSpackage).
  • partial_pbc_surf- from a randomForest[S]RC for thepbcdata set (randomForestSRCpackage)

Arguments

format

list of randomForestSRC::plot.variable objects

Details

Constructing partial plot data with the randomForestsSRC::plot.variable function are computationally expensive. We cache randomForestSRC::plot.variable objects to improve the ggRandomForests examples, diagnostics and vignettes run times. (see rfsrc_cache_datasets to rebuild a complete set of these data sets.)

For each data set listed, we build a randomForestSRC::rfsrc (see rfsrc_data), then calculate the partial plot data with randomForestSRC::plot.variable function, setting partial=TRUE. Each data set is built with the rfsrc_cache_datasets with the randomForestSRC version listed in the ggRandomForests DESCRIPTION file.

  • partial_Boston- TheBostonhousing values in suburbs of Boston from theMASSpackage. Build a regression random forest for predicting medv (median home values) on 13 covariates and 506 observations.
  • partial_pbc- Thepbcdata from the Mayo Clinic trial in primary biliary cirrhosis (PBC) of the liver conducted between 1974 and 1984. A total of 424 PBC patients, referred to Mayo Clinic during that ten-year interval, met eligibility criteria for the randomized placebo controlled trial of the drug D-penicillamine. 312 cases participated in the randomized trial and contain largely complete data. Data from therandomForestSRCpackage. Build a survival random forest for time-to-event death data with 17 covariates and 312 observations (remaining 106 observations are held out).

References

#--------------------- randomForestSRC ---------------------

Ishwaran H. and Kogalur U.B. (2014). Random Forests for Survival, Regression and Classification (RF-SRC), R package version 1.5.5.

Ishwaran H. and Kogalur U.B. (2007). Random survival forests for R. R News 7(2), 25-31.

Ishwaran H., Kogalur U.B., Blackstone E.H. and Lauer M.S. (2008). Random survival forests. Ann. Appl. Statist. 2(3), 841-860.

#--------------------- Boston data set ---------------------

Belsley, D.A., E. Kuh, and R.E. Welsch. 1980. Regression Diagnostics. Identifying Influential Data and Sources of Collinearity. New York: Wiley.

Harrison, D., and D.L. Rubinfeld. 1978. "Hedonic Prices and the Demand for Clean Air." J. Environ. Economics and Management 5: 81-102.

#--------------------- pbc data set ---------------------

Flemming T.R and Harrington D.P., (1991) Counting Processes and Survival Analysis. New York: Wiley.

T Therneau and P Grambsch (2000), Modeling Survival Data: Extending the Cox Model, Springer-Verlag, New York. ISBN: 0-387-98784-3.

See Also

\code{MASS::Boston} randomForestSRC::pbc randomForestSRC::plot.variable rfsrc_data rfsrc_cache_datasets gg_partial plot.gg_partial

Examples

Run this code
#---------------------------------------------------------------------
# MASS::Boston data - regression random forest
#---------------------------------------------------------------------
# load the rfsrc object from the cached data
data(rfsrc_Boston, package="ggRandomForests")

# The plot.variable call
partial_Boston <- plot.variable(rfsrc_Boston,
                                partial=TRUE, show.plots = FALSE )

# plot the forest partial plots
gg_dta <- gg_partial(partial_Boston)
plot(gg_dta, panel=TRUE)

#---------------------------------------------------------------------
# randomForestSRC::pbc data - survival random forest
#---------------------------------------------------------------------
# load the rfsrc object from the cached data
data(rfsrc_pbc, package="ggRandomForests")

# The plot.variable call -
# survival requires a time point specification.
# for the pbc data, we want 1, 3 and 5 year survival.
partial_pbc <- lapply(c(1,3,5), function(tm){
                      plot.variable(rfsrc_pbc, surv.type = "surv",
                                    time = tm,
                                    xvar.names = xvar,
                                    partial = TRUE,
                                    show.plots = FALSE)
                                    })

# plot the forest partial plots
gg_dta <- gg_partial(partial_pbc)
plot(gg_dta)

Run the code above in your browser using DataLab