foreach. Second, multiple forests can be built in parallel on multiple machines, then merged into one. For large data sets, disk-based big.matrix's may be used for storing data and intermediate computations, to prevent excessive virtual memory swapping by the operating system. Currently, only classification forests with a subset of the functionality in Breiman and Cutler's original code are implemented. More functionality and regression trees will be added in the future. See file INSTALL-WINDOWS in the source package for Windows installation instructions.
bigrfc or grow. See foreach for more details on supported parallel backends.
library(doParallel)
registerDoParallel(cores=detectCores(all.tests=TRUE))
Multiple random forests can also be built in parallel on multiple machines (using the same training data and parameters), then merged into one forest using merge. For large data sets, the training data, intermediate computations and some outputs (e.g. proximity matrices) may be cached on disk using big.matrix objects. This enables random forests to be built on fairly large data sets without hitting RAM limits, which will cause excessive virtual memory swapping by the operating system. Disk caching may be turned off for optimal performance on smaller data sets by setting function / method argument cachepath to NULL, causing the big.matrix's to be created in memory.| Package: |
| bigrf |
| Version: |
| 0.1-12 |
| Date: |
| 2015-10-21 |
| OS_type: |
| unix |
| Depends: |
| R (>= 2.14), methods, bigmemory (>= 4.5.8) |
| Imports: |
| foreach |
| Suggests: |
| MASS, doParallel |
| LinkingTo: |
| bigmemory, BH |
| License: |
| GPL-3 |
| Copyright: |
| 2013-2015 Aloysius Lim |
| URL: |
| https://github.com/aloysius-lim/bigrf |
| BugReports: |
| https://github.com/aloysius-lim/bigrf/issues |
Index:
bigcforest-class Classification Random Forests
bigcprediction-class Random Forest Predictions
bigctree-class Classification Trees in Random
Forests
bigrf-package Big Random Forests: Classification
and Regression Forests for Large Data
Sets
bigrfc Build a Classification Random Forest
Model
bigrfprox-class Proximity Matrices
fastimp-methods Compute Fast (Gini) Variable
Importance
generateSyntheticClass Generate Synthetic Second Class for
Unsupervised Learning
grow-methods Grow More Trees in a Random Forest
interactions-methods Compute Variable Interactions
merge-methods Merge Two Random Forests
outliers-methods Compute Outlier Scores
predict-methods Predict Classes of Test Examples
proximities-methods Compute Proximity Matrix
scaling-methods Compute Metric Scaling Co-ordinates
varimp-methods Compute Variable Importance
The main entry point for this package is bigrfc, which is used to build a classification random forest on the given training data and forest-building parameters. bigrfc returns the forest as an object of class "bigcforest", which contains the trees grown as objects of class "bigctree". After a forest is built, more trees can be grown using grow.
Breiman, L. & Cutler, A. (n.d.). Random Forests. Retrieved from http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm.
randomForest
cforest