Various parameters that control aspects of the `ctree' fit.
ctree_control(teststat = c("quad", "max"), testtype = c("Bonferroni", "Univariate", "Teststatistic"), mincriterion = 0.95, minsplit = 20L, minbucket = 7L, minprob = 0.01, stump = FALSE, maxsurrogate = 0L, mtry = Inf, maxdepth = Inf, multiway = FALSE, splittry = 2L, majority = FALSE, applyfun = NULL, cores = NULL)mtry = Inf means that no random selection takes place.maxdepth = Inf
means that no restrictions are applied to tree sizes.FALSE, observations which can't be classified to a
daughter node because of missing information are randomly
assigned (following the node distribution). If FALSE,
they go with the majority (the default in ctree).lapply-style function with arguments
function(X, FUN, ...). It is used for computing the variable selection criterion.
The default is to use the basic lapply
function unless the cores argument is specified (see below).applyfun is set to
mclapply with the desired number of cores. The arguments teststat, testtype and mincriterion
determine how the global null hypothesis of independence between all input
variables and the response is tested (see ctree).
The variable with most extreme p-value or test statistic is selected
for splitting. If this isn't possible due to sample size constraints
explained in the next paragraph, up to splittry other variables
are inspected for possible splits.
A split is established when all of the following criteria are met:
1) the sum of the weights in the current node
is larger than minsplit, 2) a fraction of the sum of weights of more than
minprob will be contained in all daughter nodes, 3) the sum of
the weights in all daughter nodes exceeds minbucket, and 4)
the depth of the tree is smaller than maxdepth.
This avoids pathological splits deep down the tree.
When stump = TRUE, a tree with at most two terminal nodes is computed.
The argument mtry > 0 means that a random forest like `variable
selection', i.e., a random selection of mtry input variables, is
performed in each node.
In each inner node, maxsurrogate surrogate splits are computed
(regardless of any missing values in the learning sample). Factors
in test samples whose levels were empty in the learning sample
are treated as missing when computing predictions (in contrast
to ctree. Note also the different behaviour of
majority in the two implementations.