Learn R Programming

PrInDT (version 2.0.1)

PrInDTMulev: PrInDT analysis for a classification problem with multiple classes.

Description

PrInDT analysis for a classification problem with more than 2 classes. For each combination of one class vs. the other classes a 2-class PrInDT analysis is carried out.
The percentages for undersampling of the larger class ('percl' in PrInDT) are chosen so that the resulting sizes are comparable with the size of the smaller classes for which all their observations are used in undersampling ('percs' = 1 in PrInDT).
The class with the highest probability in the K (= number of classes) analyses is chosen for prediction.
Interpretability is checked (see 'ctestv'). The parameters 'conf.level', 'minsplit', and 'minbucket' can be used to control the size of the trees.

Usage

PrInDTMulev(datain,classname,ctestv=NA,N,conf.level=0.95,seedl=FALSE,
                                    minsplit=NA,minbucket=NA)

Value

class

levels of class variable

trees

trees for the levels of the class variable; refer to an individual tree as trees[[k]], k = 1, ..., no. of levels

ba

balanced accuracy of combined predictions

conf

confusion matrix of combined predictions

ninterp

no. of non-interpretable trees

acc

balanced accuracies of best models for individual classes

Arguments

datain

Input data frame with class factor variable 'classname' and the
influential variables, which need to be factors or numericals (transform logicals and character variables to factors)

classname

Name of class variable (character)

ctestv

Vector of character strings of forbidden split results;
(see function PrInDT for details.)
If no restrictions exist, the default = NA is used.

N

Number of repetitions (integer > 0)

conf.level

(1 - significance level) in function ctree (numerical, > 0 and <= 1)
(default = 0.95)

seedl

Should the seed for random numbers be set (TRUE / FALSE)?
default = FALSE

minsplit

Minimum number of elements in a node to be splitted;
default = 20

minbucket

Minimum number of elements in a node;
default = 7

Details

Standard output can be produced by means of print(name) or just name as well as plot(name) where 'name' is the output data frame of the function.
The plot function will produce a series of more than one plot. If you use R, you might want to specify windows(record=TRUE) before plot(name) to save the whole series of plots. In R-Studio this functionality is provided automatically.

Examples

Run this code
datastrat <- PrInDT::data_zero
data <- na.omit(datastrat)
ctestv <- NA
data$rel[data$ETH %in% c("C1a","C1b","C1c") & data$real == "zero"] <- "zero1"
data$rel[data$ETH %in% c("C2a","C2b","C2c") & data$real == "zero"] <- "zero2"
data$rel[data$real == "realized"] <- "real"
data$rel <- as.factor(data$rel) # rel is new class variable
data$real <- NULL # remove old class variable
N <- 51
conf.level <- 0.99 # 1 - significance level (mincriterion) in ctree
out <- PrInDTMulev(data,"rel",ctestv,N,conf.level) 
out # print best models based on subsamples
plot(out) # corresponding plots

Run the code above in your browser using DataLab