Learn R Programming

pedometrics (version 0.6-4)

optimRandomForest: Optimum number of iterations to de-bias a random forest regression

Description

Compute the optimum number of iterations needed to de-bias a random forest regression.

Usage

optimRandomForest(x, y, niter = 10, nruns = 100, ntree = 500,
  ntrain = 2/3, nodesize = 5, mtry = max(floor(ncol(x)/3), 1),
  profile = TRUE, progress = TRUE)

Arguments

x
Data frame or matrix of covariates (predictor variables).
y
Numeric vector with the response variable.
niter
Number of iterations. Defaults to niter = 10.
nruns
Number of simulations to be used in each iteration. Defaults to nruns = 100.
ntree
Number of trees to grow. Defaults to ntree = 500.
ntrain
Number (or proportion) of observation to be used as training cases. Defaults to 2/3 of the total number of observations.
nodesize
Minimum size of terminal nodes. Defaults to nodesize = 5.
mtry
Number of variables randomly sampled as candidates at each split. Defaults to 1/3 of the total number of covariates.
profile
Should the profile of the standardized mean squared prediction error be plotted at the end of the optimization? Defaults to profile = TRUE.
progress
Should a progress bar be displayed. Defaults to progress = TRUE.

Details

A fixed proportion of the total number of observations is used to calibrate (train) the random forest regression. The set of calibration observations is randomly selected from the full set of observations in each simulation. The remaining observations are used as test cases (validation). In general, the smaller the calibration dataset, the more simulation runs are needed to obtain stable estimates of the mean squared prediction error (MSPE).

The optimum number of iterations needed to de-bias the random forest regression is obtained observing the evolution of the MSPE as the number of iterations increases. The MSPE is defined as the mean of the squared differences between predicted and observed values.

References

Breiman, L. Random forests. Machine Learning. v. 45, p. 5-32, 2001.

Breiman, L. Using adaptive bagging to debias regressions. Berkeley: University of California, p. 16, 1999.

Liaw, A. & Wiener, M. Classification and regression by randomForest. R News. v. 2/3, p. 18-22, 2002.

Xu, R. Improvements to random forest methodology. Ames, Iowa: Iowa State University, p. 87, 2013.

See Also

randomForest