Function partykit::ctree is a reimplementation of (most of)
party::ctree employing the new party infrastructure
of the partykit infrastructure. Although the new code was already
extensively tested, it is not yet as mature as the old code. If you notice
differences in the structure/predictions of the resulting trees, please
contact the package maintainers. See also below for some remarks about
the internals of the different implementations and how they should be
merged in future releases.
Conditional inference trees estimate a regression relationship by binary recursive
partitioning in a conditional inference framework. Roughly, the algorithm
works as follows: 1) Test the global null hypothesis of independence between
any of the input variables and the response (which may be multivariate as well).
Stop if this hypothesis cannot be rejected. Otherwise select the input
variable with strongest association to the response. This
association is measured by a p-value corresponding to a test for the
partial null hypothesis of a single input variable and the response.
2) Implement a binary split in the selected input variable.
3) Recursively repeate steps 1) and 2). The implementation utilizes a unified framework for conditional inference,
or permutation tests, developed by Strasser and Weber (1999). The stop
criterion in step 1) is either based on multiplicity adjusted p-values
(testtype = "Bonferroni" in ctree_control)
or on the univariate p-values (testtype = "Univariate"). In both cases, the
criterion is maximized, i.e., 1 - p-value is used. A split is implemented
when the criterion exceeds the value given by mincriterion as
specified in ctree_control. For example, when
mincriterion = 0.95, the p-value must be smaller than
$0.05$ in order to split this node. This statistical approach ensures that
the right sized tree is grown and no form of pruning or cross-validation
or whatsoever is needed. The selection of the input variable to split in
is based on the univariate p-values avoiding a variable selection bias
towards input variables with many possible cutpoints.
Predictions can be computed using predict, which returns predicted means,
predicted classes or median predicted survival times and
more information about the conditional
distribution of the response, i.e., class probabilities
or predicted Kaplan-Meier curves. For observations
with zero weights, predictions are computed from the fitted tree
when newdata = NULL.
For a general description of the methodology see Hothorn, Hornik and
Zeileis (2006) and Hothorn, Hornik, van de Wiel and Zeileis (2006).
Implementation details and roadmap: As pointed above, the function ctree
is a reimplementation of ctree. Not only
the R code changed but also the underlying C code which at the moment
does not support the xtrafo and ytrafo arguments due to
efficiency considerations. The roadmap for future releases is the following:
(1) Make party depend on partykit. (2) Merge the R code into
a single ctree function but keeping the two underlying C functions
separate. (3) The new interface will always return an object of class
party but call the new and more efficient C code only if
xtrafo was not used (while ytrafo should be integrated into
the new C code).