party (version 0.2-9)

Control Hyper Parameters: Control for Conditional Tree Models

Description

Various parameters that control aspects of the `ctree' fit.

Usage

ctree_control(teststattype = c("quadform", "maxabs"), 
              testtype = c("Bonferroni", "MonteCarlo", "Raw"), 
              mincriterion = 0.95, minsplit = 20, stump = FALSE, 
              nresample = 9999, maxsurrogate = 0, mtry = 0, 
              savesplitstats = TRUE)

Arguments

teststattype
a character specifying the type of the test statistic to be applied.
testtype
a character specifying how to compute the distribution of the test statistic.
mincriterion
the value of the test statistic or 1 - p-value that must be exceeded in order to implement a split.
minsplit
the minimum sum of the weights in a node.
stump
a logical determining whether a stump (a tree with three nodes only) is to be computed.
nresample
number of Monte-Carlo replications to use when the distribution of the test statistic is simulated.
maxsurrogate
number of surrogate splits to evaluate. Note the currently only surrogate splits in ordered covariables are implemented.
mtry
number of input variables randomly sampled as candidates at each node for random forest like algorithms. The default mtry = 0 means that no random selection takes place.
savesplitstats
a logical determining if the process of standardized two-sample statistics for split point estimate is saved for each primary split.

Value

Details

The arguments teststattype, testtype and mincriterion determine how the global null hypothesis of independence between all input variables and the response is tested (see ctree). The argument nresample is the number of Monte-Carlo replications to be used when testtype = "MonteCarlo".

A split is established when the sum of the weights in both daugther nodes is larger than minsplit, this avoids pathological splits at the borders. When stump = TRUE, a tree with at most two terminal nodes is computed.

The argument mtry > 0 means that a random forest like `variable selection', i.e., a random selection of mtry input variables, is performed in each node.

It might be informative to look at scatterplots of input variables against the standardized two-sample split statistics, those are available when savesplitstats = TRUE. Each node is then associated with a vector those length is determined by the number of observations in the learning sample and thus much more memory is required.