forestError v0.1.0

0

Monthly downloads

0th

Percentile

A Unified Framework for Random Forest Prediction Error Estimation

Estimates the conditional error distributions of random forest predictions and common parameters of those distributions, including conditional mean squared prediction errors, conditional biases, and conditional quantiles, by out-of-bag weighting of out-of-bag prediction errors as proposed by Lu and Hardin (2019+) <arXiv:1912.07435>. This package is compatible with several existing packages that implement random forests in R.

Readme

forestError: A Unified Framework for Random Forest Prediction Error Estimation

Project Status: Active – The project has reached a stable, usable state and is being actively developed.

Overview

The forestError package estimates conditional mean squared prediction errors, conditional biases, conditional prediction intervals, and conditional error distributions for random forest predictions using the plug-in method introduced in Lu and Hardin (2019+). These estimates are conditional on the test observations' predictor values, accounting for possible response heterogeneity, random forest prediction bias, and random forest prediction variability across the predictor space.

In its current state, the main function in this package accepts regression random forests built using any of the following packages:

  • randomForest,
  • randomForestSRC,
  • ranger, and
  • quantregForest.

Installation

Running the following line of code in R will install a stable version of this package from CRAN:

install.packages("forestError")

To install the developer version of this package from Github, run the following lines of code in R:

library(devtools)
devtools::install_github(repo = "benjilu/forestError")

Instructions

See the documentation for detailed information on how to use this package. A portion of the example given in the documentation is reproduced below for convenience.

# load data
data(airquality)

# remove observations with missing predictor variable values
airquality <- airquality[complete.cases(airquality), ]

# get number of observations and the response column index
n <- nrow(airquality)
response.col <- 1

# split data into training and test sets
train.ind <- sample(1:n, n * 0.9, replace = FALSE)
Xtrain <- airquality[train.ind, -response.col]
Ytrain <- airquality[train.ind, response.col]
Xtest <- airquality[-train.ind, -response.col]
Ytest <- airquality[-train.ind, response.col]

# fit random forest to the training data
rf <- randomForest(Xtrain, Ytrain, nodesize = 5,
                   ntree = 500, keep.inbag = TRUE)

# estimate conditional mean squared prediction errors, conditional
# biases, conditional prediction intervals, and conditional error
# distribution functions for the test observations
test.errors <- quantForestError(rf, Xtrain, Xtest, alpha = 0.05)

# do the same as above but this time in parallel
test.errors <- quantForestError(rf, Xtrain, Xtest, alpha = 0.05,
                                n.cores = 4)

License

See DESCRIPTION for information.

Authors

Benjamin Lu and Johanna Hardin

References

  • B. Lu and J. Hardin. A unified framework for random forest prediction error estimation. arXiv:1912.07435, 2019+. [arXiv]

Functions in forestError

Name Description
quantForestError Quantify random forest prediction error
qerror Estimated conditional prediction error quantile functions
perror Estimated conditional prediction error CDFs
No Results!

Last month downloads

Details

Type Package
License GPL-3
Encoding UTF-8
LazyData true
RoxygenNote 6.1.1
LinkingTo Rcpp
NeedsCompilation yes
Packaged 2020-01-10 20:52:25 UTC; benji
Repository CRAN
Date/Publication 2020-01-14 11:30:06 UTC

Include our badge in your README

[![Rdoc](http://www.rdocumentation.org/badges/version/forestError)](http://www.rdocumentation.org/packages/forestError)