Learn R Programming

Rforestry: Random Forests, Linear Trees, and Gradient Boosting for Inference and Interpretability

Sören Künzel, Theo Saarinen, Simon Walter, Edward Liu, Allen Tang, Jasjeet Sekhon

Introduction

Rforestry is a fast implementation of Random Forests, Gradient Boosting, and Linear Random Forests, with an emphasis on inference and interpretability.

How to install

  1. The GFortran compiler has to be up to date. GFortran Binaries can be found here.
  2. The devtools package has to be installed. You can install it using, install.packages("devtools").
  3. The package contains compiled code, and you must have a development environment to install the development version. You can use devtools::has_devel() to check whether you do. If no development environment exists, Windows users download and install Rtools and macOS users download and install Xcode.
  4. The latest development version can then be installed using

devtools::install_github("forestry-labs/Rforestry"). For Windows users, you'll need to skip 64-bit compilation devtools::install_github("forestry-labs/Rforestry", INSTALL_opts = c('--no-multiarch')) due to an outstanding gcc issue.

Usage

set.seed(292315)
library(Rforestry)
test_idx <- sample(nrow(iris), 3)
x_train <- iris[-test_idx, -1]
y_train <- iris[-test_idx, 1]
x_test <- iris[test_idx, -1]

rf <- forestry(x = x_train, y = y_train)
weights = predict(rf, x_test, aggregation = "weightMatrix")$weightMatrix

weights %*% y_train
predict(rf, x_test)

Ridge Random Forest

A fast implementation of random forests using ridge penalized splitting and ridge regression for predictions.

Example:

set.seed(49)
library(Rforestry)

n <- c(100)
a <- rnorm(n)
b <- rnorm(n)
c <- rnorm(n)
y <- 4*a + 5.5*b - .78*c
x <- data.frame(a,b,c)
forest <- forestry(x, y, ridgeRF = TRUE)
predict(forest, x)

Monotonic Constraints

A parameter controlling monotonic constraints for features in forestry.

library(Rforestry)

x <- rnorm(150)+5
y <- .15*x + .5*sin(3*x)
data_train <- data.frame(x1 = x, x2 = rnorm(150)+5, y = y + rnorm(150, sd = .4))

monotone_rf <- forestry(x = data_train %>% select(-y),
                        y = data_train$y,
                        monotonicConstraints = c(-1,-1),
                        nodesizeStrictSpl = 5,
                        nthread = 1,
                        ntree = 25)
predict(monotone_rf, feature.new = data_train %>% select(-y))

OOB Predictions

We can return the predictions for the training dataset using only the trees in which each observation was out of bag. Note that when there are few trees, or a high proportion of the observations sampled, there may be some observations which are not out of bag for any trees. The predictions for these are returned NaN.

library(Rforestry)

# Train a forest
rf <- forestry(x = iris[,-1],
               y = iris[,1],
               ntree = 500)

# Get the OOB predictions for the training set
oob_preds <- getOOBpreds(rf)

# This should be equal to the OOB error
sum((oob_preds -  iris[,1])^2)
getOOB(rf)

Copy Link

Version

Install

install.packages('Rforestry')

Monthly Downloads

332

Version

0.11.1.0

License

GPL (>= 3) | file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Theo Saarinen

Last Published

March 15th, 2025

Functions in Rforestry (0.11.1.0)

multilayer-forestry

Multilayer forestry
honestRF

Honest Random Forest
getVI

getVI-forestry
predict-multilayer-forestry

predict-multilayer-forestry
preprocess_testing

preprocess_testing
preprocess_training

preprocess_training
predict-forestry

predict-forestry
testing_data_checker-forestry

Test data check
saveForestry

save RF
relinkCPP_prt

relink CPP ptr
training_data_checker

Training data check
forestry

forestry
forest_checker

Checks if forestry object has valid pointer for C++ object.
addTrees

addTrees-forestry
getOOBpreds-forestry

getOOBpreds-forestry
autoforestry

autoforestry-forestry
getOOB-forestry

getOOB-forestry
compute_lp-forestry

compute lp distances
forestry-class

forestry class
autohonestRF

Honest Random Forest
CppToR_translator

Cpp to R translator
plot-forestry

visualize a tree
make_savable

make_savable
impute_features

Feature imputation using random forests neigborhoods
loadForestry

load RF