Control for Conditional Tree Forests
Various parameters that control aspects of the `cforest' fit via its `control' argument.
cforest_unbiased(…) cforest_classical(…) cforest_control(teststat = "max", testtype = "Teststatistic", mincriterion = qnorm(0.9), savesplitstats = FALSE, ntree = 500, mtry = 5, replace = TRUE, fraction = 0.632, trace = FALSE, …)
a character specifying the type of the test statistic to be applied.
a character specifying how to compute the distribution of the test statistic.
the value of the test statistic (for
testtype == "Teststatistic"), or 1 - p-value (for other values of
testtype) that must be exceeded in order to implement a split.
number of input variables randomly sampled as candidates at each node for random forest like algorithms. Bagging, as special case of a random forest without random input variable sampling, can be performed by setting
mtryeither equal to
NULLor manually equal to the number of input variables.
a logical determining whether the process of standardized two-sample statistics for split point estimate is saved for each primary split.
number of trees to grow in a forest.
a logical indicating whether sampling of observations is done with or without replacement.
fraction of number of observations to draw without replacement (only relevant if
replace = FALSE).
a logical indicating if a progress bar shall be printed while the forest grows.
additional arguments to be passed to
determine how the global null hypothesis of independence between all input
variables and the response is tested (see
nresample is the number of Monte-Carlo replications to be
testtype = "MonteCarlo".
A split is established when the sum of the weights in both daugther nodes
is larger than
minsplit, this avoids pathological splits at the
stump = TRUE, a tree with at most two terminal nodes
mtry argument regulates a random selection of
variables in each node. Note that here
mtry is fixed to the value 5 by
default for merely technical reasons, while in
the default values for classification and regression vary with the number of input
variables. Make sure that
mtry is defined properly before using
It might be informative to look at scatterplots of input variables against
the standardized two-sample split statistics, those are available when
savesplitstats = TRUE. Each node is then associated with a vector
whose length is determined by the number of observations in the learning
sample and thus much more memory is required.
The number of trees
ntree can be increased for large numbers of input variables.
cforest_unbiased returns the settings suggested
for the construction of unbiased random forests (
teststat = "quad", testtype = "Univ",
replace = FALSE) by Strobl et al. (2007)
and is the default since version 0.9-90.
Hyper parameter settings mimicing the behaviour of
randomForest are available in
cforest_classical which have been used as default up to
An object of class
Carolin Strobl, Anne-Laure Boulesteix, Achim Zeileis and Torsten Hothorn (2007). Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution. BMC Bioinformatics, 8, 25. DOI: 10.1186/1471-2105-8-25