varimp

varimp-methods

varimp,bigcforest-method

A random forest of class <code>"bigcforest"</code>.

forest

A <code><a rd-options="" href="/link/big.matrix?package=bigrf&version=0.1-12" data-mini-rdoc="bigrf::big.matrix">big.matrix</a></code>, <code>matrix</code> or <code>data.frame</code> of predictor variables. The data must not have changed, otherwise unexpected modelling results may occur. If a <code>matrix</code> or <code>data.frame</code> is specified, it will be converted into a <code>big.matrix</code> for computation. Optional if <code>reuse.cache</code> is <code>TRUE</code>.

A logical indicating whether to compute the variable importance for each out-of-bag example.

impbyexample

<code>TRUE</code> to reuse disk caches of the <code>big.matrix</code> <code>x</code> from the initial building of the random forest, which may significantly reduce initialization time for large data sets. If <code>TRUE</code>, the user must ensure that the files &#145;x&#146; and &#145;x.desc&#146; in <code>forest@cachepath</code> have not been modified or deleted.

reuse.cache

<code>0</code> for no verbose output. <code>1</code> to print verbose output. Default: <code>0</code>.

trace


  Compute variable importance based on out-of-bag estimates. For each tree in the forest, the predictions of the out-of-bag examples are recorded. Then, a variable v is randomly permuted in the out-of-bag examples, and the tree is used to classify the out-of-bag examples again. The difference in votes for the correct class in the original data and the permuted data is used to calculate the variable importance for variable v. This process is then repeated for all variables.


methods

This is an implementation of Leo Breiman's and Adele Cutler's
Random Forest algorithms for classification and regression, with optimizations
for performance and for handling of data sets that are too large to be
processed in memory. Forests can be built in parallel at two levels. First,
trees can be grown in parallel on a single machine using foreach. Second,
multiple forests can be built in parallel on multiple machines, then merged
into one. For large data sets, disk-based big.matrix's may be used for storing
data and intermediate computations, to prevent excessive virtual memory
swapping by the operating system. Currently, only classification forests with
a subset of the functionality in Breiman and Cutler's original code are
implemented. More functionality and regression trees may be added in the
future.

Aloysius Lim

bigrf

Big Random Forests: Classification and Regression Forests for Large Data Sets

varimp-methods function

<dl>
 <dt><code>signature(forest = "bigcforest")</code></dt><dd>Compute variable importance for a classification random forest.</dd> </dl>

Methods

A <code><a rd-options='' href='big.matrix'>big.matrix</a></code>, <code>matrix</code> or <code>data.frame</code> of predictor variables. The data must not have changed, otherwise unexpected modelling results may occur. If a <code>matrix</code> or <code>data.frame</code> is specified, it will be converted into a <code>big.matrix</code> for computation. Optional if <code>reuse.cache</code> is <code>TRUE</code>.

<code>TRUE</code> to reuse disk caches of the <code>big.matrix</code> <code>x</code> from the initial building of the random forest, which may significantly reduce initialization time for large data sets. If <code>TRUE</code>, the user must ensure that the files &#145;x&#146; and &#145;x.desc&#146; in <code>forest@cachepath</code> have not been modified or deleted.

varimp-methods: Compute Variable Importance

Description

Usage

Arguments

Value

Methods

References

See Also

Examples