PrInDTAll: Conditional inference tree (ctree) based on all observations

Description

ctree based on all observations in 'datain'. Interpretability is checked (see 'ctestv'); probability threshold can be specified. The parameters 'conf.level', 'minsplit', and 'minbucket' can be used to control the size of the trees.

Reference
Weihs, C., Buschfeld, S. 2021a. Combining Prediction and Interpretation in Decision Trees (PrInDT) - a Linguistic Example. arXiv:2103.02336

In the case of repeated measurements ('indrep=1'), the values of the substructure variable have to be given in 'repvar'. Only one value of 'classname' is allowed for each value of 'repvar'. If for a value of 'repvar' the percentage 'thr' of the observed occurence of a value of 'classname' is not reached by the number of predictions of the value of 'classname', a misclassification is detected.

Usage

PrInDTAll(datain, classname, ctestv=NA, conf.level=0.95, thres=0.5,
                 minsplit=NA,minbucket=NA,repvar=NA,indrep=0,thr=0.5)

Value

treeall: ctree based on all observations

baAll

balanced accuracy of 'treeall'

interpAll

criterion of interpretability of 'treeall' (TRUE / FALSE)

confAll

confusion matrix of 'treeall'

acc1AE

Accuracy of full sample tree on Elements of large class

acc2AE

Accuracy of full sample tree on Elements of small class

bamaxAE

balanced accuracy of full sample tree on Elements

namA1

Names of misclassified Elements by full sample tree of large class

namA2

Names of misclassified Elements by full sample tree of small class

lablarge

Label of large class

labsmall

Label of small class

thr

Threshold for repeated measurements

Arguments

datain: Input data frame with class factor variable 'classname' and the
influential variables, which need to be factors or numericals (transform logicals and character variables to factors)
classname: Name of class variable (character)
ctestv: Vector of character strings of forbidden split results;
(see function PrInDT for details.)
If no restrictions exist, the default = NA is used.
conf.level: (1 - significance level) in function ctree (numerical, > 0 and <= 1); default = 0.95
thres: Probability threshold for prediction of smaller class (numerical, >= 0 and < 1); default = 0.5
minsplit: Minimum number of elements in a node to be splitted;
default = 20
minbucket: Minimum number of elements in a node;
default = 7
repvar: Values of variable defining the substructure in the case of repeated measurements, length = dim(datain)[1] necessary; default=NA
indrep: Indicator of repeated measurements ('indrep=1'); default = 0
thr: threshold for element classification: minimum percentage of correct class entries; default = 0.5

Details

Standard output can be produced by means of print(name) or just name as well as plot(name) where 'name' is the output data frame of the function.

Examples

Run this code

datastrat <- PrInDT::data_zero
data <- na.omit(datastrat)
ctestv <- rbind('ETH == {C2a,C1a}','MLU == {1, 3}')
conf.level <- 0.99 # 1 - significance level (mincriterion) in ctree
outAll <- PrInDTAll(data,"real",ctestv,conf.level) 
print(outAll) # print model based on all observations
plot(outAll) # plot model based on all observations

Run the code above in your browser using DataLab