The package implements Breiman's random forests for a variety of data
settings. Regression and classification forests are grown when the
response is numeric or categorical (factor) while survival and
competing risk forests (Ishwaran et al. 2008, 2012) are grown when the
response is right-censored. Different splitting rules invoked under
deterministic or random splitting are available for all families.
Variable predictiveness can be assessed using variable importance
(VIMP) measures for single, as well as grouped variables. Variable
selection is implemented using minimal depth variable selection
(Ishwaran et al. 2010). Missing data (for x-variables and y-outcomes)
can be imputed on both training and test data. This package implements OpenMP shared-memory parallel programming.
However, the default installation will only execute serially. To
utilize OpenMP, the target architecture and operating system must
first support it.
To install the package with OpenMP parallel processing enabled, on most
non-Windows systems, do the following:
- Download the package source code from CRAN.
- Open a console, navigate to the directory containing the
tarball, and untar it using the commandtar -xvf
randomForestSRC_X.x.x.tar.gzwhereX.x.xis the version of the
package you have downloaded.
- This will create a directory structure with the root directory
of the package namedrandomForestSRC. Change into the root
directory of the package using the commandcd randomForestSRC
- Run autoconf using the commandautoconf
- Change back to your working directory using the commandcd ..
- RunR CMD INSTALL randomForestSRCon the modified
package. Ensure that you do not target the unmodified tarball, but
instead act on the directory structure you just modified.
To install the package with OpenMP parallel processing enabled, on
most Windows systems, do the following:
- Download the Windows specific custom binary build fromhttp://www.ccs.miami.edu/~hishwaran/rfsrc.html
- If you are using the R GUI, start the GUI. From the menu
click onPackages > Install package(s) from local zip files.
Then navigate to the directory where you downloaded the zip file and
click onrandomForestSRC_X.x.x.zip. Alternatively, you can
manually open a console, navigate to the zip file, and install the
package by using the commandR CMD INSTALL
randomForestSRC_X.x.x.zip
There are several ways to control the number of CPU cores that the
package accesses during OpenMP parallel execution. First, you will
need to determine the number of cores on your local machine. Do this
by starting an R session and issuing the command
detectCores().
Then you can do the following:
At the start of every R session, you can set the number of cores
accessed during OpenMP parallel execution by issuing the command
options(rf.cores = x), where x is the number of
cores. If x is a negative number, the package will access
the maximum number of cores on your machine. The options command can
also be placed in the users .Rprofile file for convenience. You can,
alternatively, initialize the environment variable RF_CORES
in your shell environment.
The default value for rf.cores is two (2L), if left unspecified.
This package also implements R-side parallel processing by replacing
the R function lapply with mclapply found in the
parallel package. You can set the number of cores accessed by
mclapply by issuing the command options(mc.cores = x), where
x is the number of cores. The options command can also be
placed in the users .Rprofile file for convenience. You can,
alternatively, initialize the environment variable MC_CORES
in your shell environment. See the help files in parallel for
more information.
The default value for mclapply on non-Windows systems is
two (2L) cores. On Windows systems, the default value is one (1L) core.
This package contains many useful functions and users should read the
help file in its entirety for details. However, we briefly mention
several key functions that may make it easier to navigate and
understand the layout of the package.
- rfsrcThis is the main entry point to the package. It grows a random forest
using user supplied training data. We refer to the resulting object
as a RF-SRC grow object. Formally, the resulting object has class
(rfsrc, grow)
. - predict.rfsrc(predict)
Used for prediction. Predicted values are obtained by dropping the
user supplied test data down the grow forest. The resulting object
has class
(rfsrc, predict)
. - max.subtree,var.selectUsed for variable selection. The functionmax.subtreeextracts maximal subtree information from a RF-SRC object which is
used for selecting variables by making use of minimal depth variable
selection. The functionvar.selectprovides
an extensive set of variable selection options and is a wrapper tomax.subtree.
- impute.rfsrcFast imputation mode for RF-SRC. Bothrfsrcandpredict.rfsrcare capable of imputing missing data.
However, for users whose only interest is imputing data, this function
provides an efficient and fast interface for doing so.