Control for Conditional Inference Trees
Various parameters that control aspects of the `ctree' fit.
ctree_control(teststat = c("quad", "max"), testtype = c("Bonferroni", "Univariate", "Teststatistic"), mincriterion = 0.95, minsplit = 20L, minbucket = 7L, minprob = 0.01, stump = FALSE, maxsurrogate = 0L, mtry = Inf, maxdepth = Inf, multiway = FALSE, splittry = 2L, majority = FALSE, applyfun = NULL, cores = NULL)
- a character specifying the type of the test statistic to be applied.
- a character specifying how to compute the distribution of the test statistic.
- the value of the test statistic or 1 - p-value that must be exceeded in order to implement a split.
- the minimum sum of weights in a node in order to be considered for splitting.
- the minimum sum of weights in a terminal node.
- proportion of observations needed to establish a terminal node.
- a logical determining whether a stump (a tree with three nodes only) is to be computed.
- number of surrogate splits to evaluate. Note the currently only surrogate splits in ordered covariables are implemented.
- number of input variables randomly sampled as candidates
at each node for random forest like algorithms. The default
mtry = Infmeans that no random selection takes place.
- maximum depth of the tree. The default
maxdepth = Infmeans that no restrictions are applied to tree sizes.
- a logical indicating if multiway splits for all factor levels are implemented for unordered factors.
- number of variables that are inspected for admissible splits if the best split doesn't meet the sample size constraints.
FALSE, observations which can't be classified to a daughter node because of missing information are randomly assigned (following the node distribution). If
FALSE, they go with the majority (the default in
- an optional
lapply-style function with arguments
function(X, FUN, ...). It is used for computing the variable selection criterion. The default is to use the basic
lapplyfunction unless the
coresargument is specified (see below).
- numeric. If set to an integer the
applyfunis set to
mclapplywith the desired number of
determine how the global null hypothesis of independence between all input
variables and the response is tested (see
The variable with most extreme p-value or test statistic is selected
for splitting. If this isn't possible due to sample size constraints
explained in the next paragraph, up to
splittry other variables
are inspected for possible splits.
A split is established when all of the following criteria are met:
1) the sum of the weights in the current node
is larger than
minsplit, 2) a fraction of the sum of weights of more than
minprob will be contained in all daughter nodes, 3) the sum of
the weights in all daughter nodes exceeds
minbucket, and 4)
the depth of the tree is smaller than
This avoids pathological splits deep down the tree.
stump = TRUE, a tree with at most two terminal nodes is computed.
mtry > 0 means that a random forest like `variable
selection', i.e., a random selection of
mtry input variables, is
performed in each node.
In each inner node,
maxsurrogate surrogate splits are computed
(regardless of any missing values in the learning sample). Factors
in test samples whose levels were empty in the learning sample
are treated as missing when computing predictions (in contrast
ctree. Note also the different behaviour of
majority in the two implementations.