baseline: Create baseline evaluations

Description

Create a baseline evaluation of a test set.

When family is gaussian: fits baseline models (y ~ 1) on n random subsets of train_data and evalutes each model on test_data. Also evaluates a model fitted on all rows in train_data.

When family is binomial: evaluates n sets of random predictions against the dependent variable, along with a set of all 0 predictions and a set of all 1 predictions.

When family is multinomial: creates one-vs-all (binomial) baseline evaluations for n sets of random predictions against the dependent variable, along with sets of "all class x,y,z,..." predictions.

baseline() is under development! Large changes may occur.

Usage

baseline(test_data, dependent_col, train_data = NULL, n = 100,
  family = "binomial", positive = 2, cutoff = 0.5,
  random_generator_fn = runif, random_effects = NULL,
  min_training_rows = 5, min_training_rows_left_out = 3,
  parallel = FALSE)

Arguments

test_data

Data Frame.

dependent_col

Name of dependent variable in the supplied test and training sets.

train_data

Data Frame. Only used when family == "gaussian".

Number of random samplings to perform.

For gaussian: The number of random samplings of train_data to fit baseline models on.

For binomial and multinomial: The number of sets of random predictions to evaluate.

family

Name of family. (Character)

Currently supports "gaussian", "binomial" and "multinomial".

positive

Level from dependent variable to predict. Either as character or level index (1 or 2 - alphabetically).

E.g. if we have the levels "cat" and "dog" and we want "dog" to be the positive class, we can either provide "dog" or 2, as alphabetically, "dog" comes after "cat".

Used when calculating confusion matrix metrics and creating ROC curves.

N.B. Only affects evaluation metrics, not the returned predictions.

N.B. Binomial only. (Character or Integer)

cutoff

Threshold for predicted classes. (Numeric)

N.B. Binomial only

random_generator_fn

Function for generating random numbers when type is "multinomial". The softmax function is applied to the generated numbers to transform them to probabilities.

The first argument must be the number of random numbers to generate, as no other arguments are supplied.

To test the effect of using different functions, see multiclass_probability_tibble.

N.B. Multinomial only

random_effects

Random effects structure for Gaussian baseline model. (Character)

E.g. with "(1|ID)", the model becomes "y ~ 1 + (1|ID)".

N.B. Gaussian only

min_training_rows

Minimum number of rows in the random subsets of train_data.

Gaussian only. (Integer)

min_training_rows_left_out

Minimum number of rows left out of the random subsets of train_data.

I.e. a subset will maximally have the size:

max_rows_in_subset = nrow(train_data) - min_training_rows_left_out.

Gaussian only. (Integer)

parallel

Whether to run the n evaluations in parallel. (Logical)

Remember to register a parallel backend first. E.g. with doParallel::registerDoParallel.

Value

List containing:

a tibble with summarized results (called summarized_metrics)
a tibble with random evaluations (random_evaluations)
a tibble with the summarized class level results (summarized_class_level_results) (Multinomial only)

----------------------------------------------------------------

Gaussian Results

----------------------------------------------------------------

The Summarized Results tibble contains:

Average RMSE, MAE, r2m, r2c, AIC, AICc, and BIC.

The Measure column indicates the statistical descriptor used on the evaluations. The row where Measure == All_rows is the evaluation when the baseline model is trained on all rows in train_data.

The Training Rows column contains the aggregated number of rows used from train_data, when fitting the baseline models.

....................................................................

The Random Evaluations tibble contains:

The non-aggregated metrics.

A nested tibble with the predictions and targets.

A nested tibble with the coefficients of the baseline models.

Number of training rows used when fitting the baseline model on the training set.

Specified family.

Name of dependent variable.

Name of fixed effect (bias term only).

Random effects structure (if specified).

----------------------------------------------------------------

Binomial Results

----------------------------------------------------------------

Based on the generated test set predictions, a confusion matrix and ROC curve are used to get the following:

ROC:

AUC, Lower CI, and Upper CI

Confusion Matrix:

Balanced Accuracy, F1, Sensitivity, Specificity, Positive Prediction Value, Negative Prediction Value, Kappa, Detection Rate, Detection Prevalence, Prevalence, and MCC (Matthews correlation coefficient).

....................................................................

The Summarized Results tibble contains:

The Measure column indicates the statistical descriptor used on the evaluations. The row where Measure == All_0 is the evaluation when all predictions are 0. The row where Measure == All_1 is the evaluation when all predictions are 1.

The aggregated metrics.

....................................................................

The Random Evaluations tibble contains:

The non-aggregated metrics.

A nested tibble with the predictions and targets.

A nested tibble with the sensativities and specificities from the ROC curve.

A nested tibble with the confusion matrix. The Pos_ columns tells you whether a row is a True Positive (TP), True Negative (TN), False Positive (FP), or False Negative (FN), depending on which level is the "positive" class. I.e. the level you wish to predict.

Specified family.

Name of dependent variable.

----------------------------------------------------------------

Multinomial Results

----------------------------------------------------------------

Based on the generated test set predictions, one-vs-all (binomial) evaluations are performed and aggregated to get the same metrics as in the binomial results, with the addition of Overall Accuracy in the summarized results.

....................................................................

The Summarized Results tibble contains:

Summary of the random evaluations.

How: First, the one-vs-all binomial evaluations are aggregated by repetition (ignoring NAs), and then, these aggregations are summarized. Besides the metrics from the binomial evaluations (see Binomial Results above), it also includes the Overall Accuracy metric.

The Measure column indicates the statistical descriptor used on the evaluations. The Mean, Median, SD, and IQR describe the repetition evaluations (similar to the Random Evaluations tibble, but ignoring NAs when aggregating, as the NAs and INFs are counted instead), while the Max, Min, NAs, and INFs are extracted from the Summarized Class Level Results tibble, to get the overall values. The NAs and INFs are only counted in the one-vs-all evaluations.

The rows where Measure == All_<<class name>> are the evaluations when all the observations are predicted to be in that class.

....................................................................

The Summarized Class Level Results tibble contains:

The (nested) summarized results for each class, with the same metrics and descriptors as the Summarized Results tibble. Use tidyr::unnest on the tibble to inspect the results.

How: The one-vs-all evaluations are summarized by class.

The rows where Measure == All_0 are the evaluations when none of the observations are predicted to be in that class, while the rows where Measure == All_1 are the evaluations when all of the observations are predicted to be in that class.

....................................................................

The Random Evaluation tibble contains:

The repetition results with the same metrics as the Summarized Results tibble.

How: The one-vs-all evaluations are aggregated by repetition. NA's are not ignored, meaning that any NA from a one-vs-all evaluation will lead to an NA result for that repetition.

Also includes:

A nested tibble with the one-vs-all binomial evaluations (Class Level Results), including nested ROC curves and Confusion Matrices, and the Support column, which is a count of how many observations from the class is in the test set.

A nested tibble with the predictions and targets.

A nested tibble with the multiclass confusion matrix.

Specified family.

Name of dependent variable.

Details

Packages used:

Models

Gaussian: stats::lm

Results

Gaussian:

r2m : MuMIn::r.squaredGLMM

r2c : MuMIn::r.squaredGLMM

AIC : stats::AIC

AICc : AICcmodavg::AICc

BIC : stats::BIC

Binomial and Multinomial:

Confusion matrix and related metrics: caret::confusionMatrix

ROC and related metrics: pROC::roc

MCC: mltools::mcc

Examples

Run this code

# NOT RUN {
# Attach packages
library(cvms)
library(groupdata2) # partition()
library(dplyr) # %>% arrange()
library(tibble)

# Data is part of cvms
data <- participant.scores

# Set seed for reproducibility
set.seed(1)

# Partition data
partitions <- partition(data, p = 0.7, list_out = TRUE)
train_set <- partitions[[1]]
test_set <- partitions[[2]]

# Create baseline evaluations
# Note: usually n=100 is a good setting

# Gaussian
baseline(test_data = test_set, train_data = train_set,
         dependent_col = "score", random_effects = "(1|session)",
         n = 2, family = "gaussian")

# Binomial
baseline(test_data = test_set, dependent_col = "diagnosis",
         n = 2, family = "binomial")

# Multinomial

# Create some data with multiple classes
multiclass_data <- tibble(
    "target" = rep(paste0("class_", 1:5), each = 10)) %>%
    dplyr::sample_n(35)

baseline(test_data = multiclass_data,
         dependent_col = "target",
         n = 4, family = "multinomial")

# Parallelize evaluations

# Attach doParallel and register four cores
# Uncomment:
# library(doParallel)
# registerDoParallel(4)

# Binomial
baseline(test_data = test_set, dependent_col = "diagnosis",
         n = 4, family = "binomial", parallel = TRUE)

# Gaussian
baseline(test_data = test_set, train_data = train_set,
         dependent_col = "score", random_effects = "(1|session)",
         n = 4, family = "gaussian", parallel = TRUE)

# Multinomial
(mb <- baseline(test_data = multiclass_data,
               dependent_col = "target",
               n = 4, family = "multinomial",
               parallel = TRUE))

# Inspect the summarized class level results
# for class_2
mb$summarized_class_level_results %>%
 dplyr::filter(Class == "class_2") %>%
 tidyr::unnest(Results)

# Multinomial with custom random generator function
# that creates very "certain" predictions
# (once softmax is applied)

rcertain <- function(n){
    (runif(n, min = 1, max = 100)^1.4)/100
}

baseline(test_data = multiclass_data,
         dependent_col = "target",
         n = 4, family = "multinomial",
         parallel = TRUE,
         random_generator_fn = rcertain)

# }

Run the code above in your browser using DataLab