PrInDTAllparts: Conditional inference trees (ctrees) based on consecutive parts of the full sample
Description
ctrees based on the full sample of the smaller class and consecutive parts of the larger class of the nesting variable 'nesvar'.
The variable 'nesvar' has to be part of the data frame 'datain'.
Interpretability is checked (see 'ctestv'); probability threshold can be specified.
The parameters 'conf.level', 'minsplit', and 'minbucket' can be used to control the size of the trees.
Reference Weihs, C., Buschfeld, S. 2021b. NesPrInDT: Nested undersampling in PrInDT.
arXiv:2103.14931
balanced accuracy of trees on 'divt' consecutive parts of the sample
Arguments
datain
Input data frame with class factor variable 'classname' and the
influential variables, which need to be factors or numericals (transform logicals and character variables to factors)
classname
Name of class variable (character)
ctestv
Vector of character strings of forbidden split results;
(see function PrInDT for details.)
If no restrictions exist, the default = NA is used.
conf.level
(1 - significance level) in function ctree (numerical, > 0 and <= 1); default = 0.95
thres
Probability threshold for prediction of smaller class (numerical, >= 0 and < 1); default = 0.5
nesvar
Name of nesting variable (character)
divt
Number of parts of nesting variable nesvar for which models should be determined individually
minsplit
Minimum number of elements in a node to be splitted;
default = 20
minbucket
Minimum number of elements in a node;
default = 7
Details
Standard output can be produced by means of print(name) or just name where 'name' is the output data
frame of the function.