PrInDTMulev: PrInDT analysis for a classification problem with multiple classes.

Description

PrInDT analysis for a classification problem with more than 2 classes. For each combination of one class vs. the other classes a 2-class PrInDT analysis is carried out.
The percentages for undersampling of the larger class ('percl' in PrInDT) are chosen so that the resulting sizes are comparable with the size of the smaller classes for which all their observations are used in undersampling ('percs' = 1 in PrInDT).
The class with the highest probability in the K (= number of classes) analyses is chosen for prediction.
Interpretability is checked (see 'ctestv'). The parameters 'conf.level', 'minsplit', and 'minbucket' can be used to control the size of the trees.

Usage

PrInDTMulev(datain,classname,ctestv=NA,N,conf.level=0.95,seedl=FALSE,
                                    minsplit=NA,minbucket=NA)

Value

class: levels of class variable
trees: trees for the levels of the class variable; refer to an individual tree as trees[[k]], k = 1, ..., no. of levels

balanced accuracy of combined predictions

conf

confusion matrix of combined predictions

ninterp

no. of non-interpretable trees

acc

balanced accuracies of best models for individual classes

Arguments

datain: Input data frame with class factor variable 'classname' and the
influential variables, which need to be factors or numericals (transform logicals and character variables to factors)
classname: Name of class variable (character)
ctestv: Vector of character strings of forbidden split results;
(see function PrInDT for details.)
If no restrictions exist, the default = NA is used.
N: Number of repetitions (integer > 0)
conf.level: (1 - significance level) in function ctree (numerical, > 0 and <= 1)
(default = 0.95)
seedl: Should the seed for random numbers be set (TRUE / FALSE)?
default = FALSE
minsplit: Minimum number of elements in a node to be splitted;
default = 20
minbucket: Minimum number of elements in a node;
default = 7

Details

Standard output can be produced by means of print(name) or just name as well as plot(name) where 'name' is the output data frame of the function.
The plot function will produce a series of more than one plot. If you use R, you might want to specify windows(record=TRUE) before plot(name) to save the whole series of plots. In R-Studio this functionality is provided automatically.

Examples

Run this code

datastrat <- PrInDT::data_zero
data <- na.omit(datastrat)
ctestv <- NA
data$rel[data$ETH %in% c("C1a","C1b","C1c") & data$real == "zero"] <- "zero1"
data$rel[data$ETH %in% c("C2a","C2b","C2c") & data$real == "zero"] <- "zero2"
data$rel[data$real == "realized"] <- "real"
data$rel <- as.factor(data$rel) # rel is new class variable
data$real <- NULL # remove old class variable
N <- 51
conf.level <- 0.99 # 1 - significance level (mincriterion) in ctree
out <- PrInDTMulev(data,"rel",ctestv,N,conf.level) 
out # print best models based on subsamples
plot(out) # corresponding plots

Run the code above in your browser using DataLab