llm.cv: Runs v-fold cross validation with LLM

Description

In v-fold cross validation, the data are divided into v subsets of approximately equal size. Subsequently, one of the v data parts is excluded while the remaider of the data is used to create a logitleafmodel object. Predictions are generated for the excluded data part. The process is repeated v times.

Usage

llm.cv(X, Y, cv, threshold_pruning = 0.25, nbr_obs_leaf = 100)

Arguments

Dataframe containing numerical independent variables.

Numerical vector of dependent variable. Currently only binary classification is supported.

An integer specifying the number of folds in the cross-validation.

threshold_pruning

Set confidence threshold for pruning. Default 0.25.

nbr_obs_leaf

The minimum number of observations in a leaf node. Default 100.

Value

An object of class llm.cv, which is a list with the following components:

foldpred

a data frame with, per fold, predicted class membership probabilities for the left-out observations

pred

a data frame with predicted class membership probabilities.

foldclass

a data frame with, per fold, predicted classes for the left-out observations.

class

a data frame with the predicted classes.

conf

the confusion matrix which compares the real versus the predicted class memberships based on the class object.

References

Arno De Caigny, Kristof Coussement, Koen W. De Bock, A New Hybrid Classification Algorithm for Customer Churn Prediction Based on Logistic Regression and Decision Trees, European Journal of Operational Research (2018), doi: 10.1016/j.ejor.2018.02.009.

Examples

Run this code

# NOT RUN {
## Load PimaIndiansDiabetes dataset from mlbench package
if (requireNamespace("mlbench", quietly = TRUE)) {
  library("mlbench")
}
data("PimaIndiansDiabetes")
## Create the LLM with 5-cv
Pima.llm <- llm.cv(X = PimaIndiansDiabetes[,-c(9)],Y = PimaIndiansDiabetes$diabetes, cv=5,
 threshold_pruning = 0.25,nbr_obs_leaf = 100)
# }

Run the code above in your browser using DataLab