Learn R Programming

ggRandomForests (version 1.1.2)

ggRandomForests-package: ggRandomForests: Visually Exploring Random Forests

Description

ggRandomForests is a utility package for randomForestSRC (Iswaran et.al. 2014, 2008, 2007) for survival, regression and classification forests and uses the ggplot2 (Wickham 2009) package for plotting results. ggRandomForests is structured to extract data objects from the random forest and provides S3 functions for printing and plotting these objects.

The randomForestSRC package provides a unified treatment of Breiman's (2001) random forests for a variety of data settings. Regression and classification forests are grown when the response is numeric or categorical (factor) while survival and competing risk forests (Ishwaran et al. 2008, 2012) are grown for right-censored survival data.

Many of the figures created by the ggRandomForests package are also available directly from within the randomForestSRC package. However, ggRandomForests offers the following advantages:

  • Separation of data and figures:ggRandomForestcontains functions that operate on either therandomForestSRC::rfsrcforest object directly, or on the output fromrandomForestSRCpost processing functions (i.e.plot.variable,var.select,find.interaction) to generate intermediateggRandomForestsdata objects. S3 functions are provide to further process these objects and plot results using theggplot2graphics package. Alternatively, users can use these data objects for additional custom plotting or analysis operations.
  • Each data object/figure is a single, self contained object. This allows simple modification and manipulation of the data orggplot2objects to meet users specific needs and requirements.
  • The use ofggplot2for plotting. We chose to use theggplot2package for our figures to allow users flexibility in modifying the figures to their liking. Each S3 plot function returns either a singleggplot2object, or alistofggplot2objects, allowing users to use additionalggplot2functions or themes to modify and customise the figures to their liking.

The ggRandomForests package contains the following data functions:

  • gg_rfsrc: randomForest[SRC] predictions.
  • gg_error: randomForest[SRC] convergence rate based on the OOB error rate.
  • gg_roc: ROC curves for randomForest classification models.
  • gg_vimp: Variable Importance ranking for variable selection.
  • gg_minimal_depth: Minimal Depth ranking for variable selection (Ishwaran et.al. 2010).
  • gg_minimal_vimp: Comparing Minimal Depth and VIMP rankings for variable selection.
  • gg_interaction: Minimal Depth interaction detection (Ishwaran et.al. 2010)
  • gg_variable: Marginal variable dependence.
  • gg_partial: Partial (risk adjusted) variable dependence.
  • gg_partial_coplot: Partial variable conditional dependence (computationally expensive).
  • gg_survival: Kaplan-Meier/Nelson-Aalon hazard analysis.

Each of these data functions has an associated S3 plot function that returns ggplot2 objects, either individually or as a list, which can be further customised using standard ggplot2 commands.

Arguments

References

Breiman, L. (2001). Random forests, Machine Learning, 45:5-32.

Ishwaran H. and Kogalur U.B. (2014). Random Forests for Survival, Regression and Classification (RF-SRC), R package version 1.5.5.12.

Ishwaran H. and Kogalur U.B. (2007). Random survival forests for R. R News 7(2), 25--31.

Ishwaran H., Kogalur U.B., Blackstone E.H. and Lauer M.S. (2008). Random survival forests. Ann. Appl. Statist. 2(3), 841--860.

Ishwaran, H., U. B. Kogalur, E. Z. Gorodeski, A. J. Minn, and M. S. Lauer (2010). High-dimensional variable selection for survival data. J. Amer. Statist. Assoc. 105, 205-217.

Ishwaran, H. (2007). Variable importance in binary regression trees and forests. Electronic J. Statist., 1, 519-537.

Wickham, H. ggplot2: elegant graphics for data analysis. Springer New York, 2009.