Learn R Programming

PrInDT (version 2.0.1)

Prediction and Interpretation in Decision Trees for Classification and Regression

Description

Optimization of conditional inference trees from the package 'party' for classification and regression. For optimization, the model space is searched for the best tree on the full sample by means of repeated subsampling. Restrictions are allowed so that only trees are accepted which do not include pre-specified uninterpretable split results (cf. Weihs & Buschfeld, 2021a). The function PrInDT() represents the basic resampling loop for 2-class classification (cf. Weihs & Buschfeld, 2021a). The function RePrInDT() (repeated PrInDT()) allows for repeated applications of PrInDT() for different percentages of the observations of the large and the small classes (cf. Weihs & Buschfeld, 2021c). The function NesPrInDT() (nested PrInDT()) allows for an extra layer of subsampling for a specific factor variable (cf. Weihs & Buschfeld, 2021b). The functions PrInDTMulev() and PrInDTMulab() deal with multilevel and multilabel classification. In addition to these PrInDT() variants for classification, the function PrInDTreg() has been developed for regression problems. Finally, the function PostPrInDT() allows for a posterior analysis of the distribution of a specified variable in the terminal nodes of a given tree. In version 2, additionally structured sampling is implemented in functions PrInDTCstruc() and PrInDTRstruc(). In these functions, repeated measurements data can be analyzed, too. Moreover, multilabel 2-stage versions of classification and regression trees are implemented in functions C2SPrInDT() and R2SPrInDT() as well as interdependent multilabel models in functions SimCPrInDT() and SimRPrInDT(). Finally, for mixtures of classification and regression models functions Mix2SPrInDT() and SimMixPrInDT() are implemented. Most of these extensions of PrInDT are described in Buschfeld & Weihs (2025Fc). References: -- Buschfeld, S., Weihs, C. (2025Fc) "Optimizing decision trees for the analysis of World Englishes and sociolinguistic data", Cambridge Elements. -- Weihs, C., Buschfeld, S. (2021a) "Combining Prediction and Interpretation in Decision Trees (PrInDT) - a Linguistic Example" ; -- Weihs, C., Buschfeld, S. (2021b) "NesPrInDT: Nested undersampling in PrInDT" ; -- Weihs, C., Buschfeld, S. (2021c) "Repeated undersampling in PrInDT (RePrInDT): Variation in undersampling and prediction, and ranking of predictors in ensembles" .

Copy Link

Version

Install

install.packages('PrInDT')

Monthly Downloads

255

Version

2.0.1

License

GPL-2

Maintainer

Claus Weihs

Last Published

August 25th, 2025

Functions in PrInDT (2.0.1)

PrInDTCstruc

Structured subsampling for classification
PrInDTAll

Conditional inference tree (ctree) based on all observations
PostPrInDT

Posterior analysis of conditional inference trees: distribution of a specified variable in the terminal nodes.
PrInDTMulab

Multiple label classification based on resampling by PrInDT
Mix2SPrInDT

Two-stage estimation for classification-regression mixtures
NesPrInDT

Nested PrInDT with additional undersampling of a factor with two unbalanced levels
C2SPrInDT

Two-stage estimation for classification
OptPrInDT

Optimisation of undersampling percentages for classification
PrInDT

The basic undersampling loop for classification
PrInDTAllparts

Conditional inference trees (ctrees) based on consecutive parts of the full sample
PrInDTMulevAll

Conditional inference tree (ctree) for multiple classes on all observations
R2SPrInDT

Two-stage estimation for regression
PrInDTreg

Regression tree resampling by the PrInDT method
PrInDTMulabAll

Multiple label classification based on all observations
PrInDTMulev

PrInDT analysis for a classification problem with multiple classes.
SimCPrInDT

Interdependent estimation for classification
SimMixPrInDT

Interdependent estimation for classification-regression mixtures
RePrInDT

Repeated PrInDT for specified percentage combinations
PrInDTregAll

Regression tree based on all observations
participant_zero

Participants of subject pronoun study
PrInDTRstruc

Structured subsampling for regression
SimRPrInDT

Interdependent estimation for regression
data_land

Landscape analysis
data_zero

Subject pronouns
data_speaker

Subject pronouns and a predictor with one very frequent level
data_vowel

Vowel length