PrInDTAllparts: Conditional inference trees (ctrees) based on consecutive parts of the full sample

Description

ctrees based on the full sample of the smaller class and consecutive parts of the larger class of the nesting variable 'nesvar'. The variable 'nesvar' has to be part of the data frame 'datain'.
Interpretability is checked (see 'ctestv'); probability threshold can be specified. The parameters 'conf.level', 'minsplit', and 'minbucket' can be used to control the size of the trees.

Reference
Weihs, C., Buschfeld, S. 2021b. NesPrInDT: Nested undersampling in PrInDT. arXiv:2103.14931

Usage

PrInDTAllparts(datain, classname, ctestv=NA, conf.level=0.95, thres=0.5,
       nesvar, divt,minsplit=NA,minbucket=NA)

Value

baAll: balanced accuracy of tree on full sample

nesvar

name of nesting variable

divt

number of consecutive parts of the sample

badiv

balanced accuracy of trees on 'divt' consecutive parts of the sample

Arguments

datain: Input data frame with class factor variable 'classname' and the
influential variables, which need to be factors or numericals (transform logicals and character variables to factors)
classname: Name of class variable (character)
ctestv: Vector of character strings of forbidden split results;
(see function PrInDT for details.)
If no restrictions exist, the default = NA is used.
conf.level: (1 - significance level) in function ctree (numerical, > 0 and <= 1); default = 0.95
thres: Probability threshold for prediction of smaller class (numerical, >= 0 and < 1); default = 0.5
nesvar: Name of nesting variable (character)
divt: Number of parts of nesting variable nesvar for which models should be determined individually
minsplit: Minimum number of elements in a node to be splitted;
default = 20
minbucket: Minimum number of elements in a node;
default = 7

Details

Standard output can be produced by means of print(name) or just name where 'name' is the output data frame of the function.

Examples

Run this code

data <- PrInDT::data_speaker
data <- na.omit(data)
nesvar <- "SPEAKER"
outNesAll <- PrInDTAllparts(data,"class",ctestv=NA,conf.level=0.95,thres=0.5,nesvar,divt=8)
outNesAll

Run the code above in your browser using DataLab