evaluate: Evaluate your model's performance

Description

Evaluate your model's predictions on a set of evaluation metrics.

Create ID-aggregated evaluations by multiple methods.

Currently supports linear regression, binary classification and multiclass classification (see type).

evaluate() is under development! Large changes may occur.

Usage

evaluate(data, target_col, prediction_cols, type = "gaussian",
  id_col = NULL, id_method = "mean", models = NULL,
  apply_softmax = TRUE, cutoff = 0.5, positive = 2,
  metrics = list(), include_predictions = TRUE, parallel = FALSE)

Arguments

data

Data frame with predictions, targets and (optionally) an ID column. Can be grouped with group_by.

Multinomial

When type is "multinomial", the predictions should be passed as one column per class with the probability of that class. The columns should have the name of their class, as they are named in the target column. E.g.:

class_1	class_2	class_3	target	0.269
0.528	0.203	class_2	0.368	0.322
0.310	class_3	0.375	0.371	0.254
class_2	class_1	class_2	class_3	target

Binomial

When type is "binomial", the predictions should be passed as one column with the probability of class being the second class alphabetically (1 if classes are 0 and 1). E.g.:

prediction	target	0.769	1	0.368
1	0.375	0	prediction	target

Gaussian

When type is "gaussian", the predictions should be passed as one column with the predicted values. E.g.:

prediction	target	28.9	30.2	33.2
27.1	23.4	21.3	prediction	target

target_col

Name of the column with the true classes/values in data.

When type is "multinomial", this column should contain the class names, not their indices.

prediction_cols

Name(s) of column(s) with the predictions.

When evaluating a classification task, the column(s) should contain the predicted probabilities.

type

Type of evaluation to perform:

"gaussian" for linear regression.

"binomial" for binary classification.

"multinomial" for multiclass classification.

id_col

Name of ID column to aggregate predictions by.

N.B. Current methods assume that the target class/value is constant within the IDs.

N.B. When aggregating by ID, some metrics (such as those from model objects) are excluded.

id_method

Method to use when aggregating predictions by ID. Either "mean" or "majority".

When type is gaussian, only the "mean" method is available.

mean

The average prediction (value or probability) is calculated per ID and evaluated. This method assumes that the target class/value is constant within the IDs.

majority

The most predicted class per ID is found and evaluated. In case of a tie, the winning classes share the probability (e.g. P = 0.5 each when two majority classes). This method assumes that the target class/value is constant within the IDs.

models

Unnamed list of fitted model(s) for calculating R^2 metrics and information criterion metrics. May only work for some types of models.

When only passing one model, remember to pass it in a list (e.g. list(m)).

N.B. When data is grouped, provide one model per group in the same order as the groups.

N.B. When aggregating by ID (i.e. when id_col is not NULL), it's not currently possible to pass model objects, as these would not be aggregated by the IDs.

N.B. Currently, Gaussian only.

apply_softmax

Whether to apply the softmax function to the prediction columns when type is "multinomial".

N.B. Multinomial models only.

cutoff

Threshold for predicted classes. (Numeric)

N.B. Binomial models only.

positive

Level from dependent variable to predict. Either as character or level index (1 or 2 - alphabetically).

E.g. if we have the levels "cat" and "dog" and we want "dog" to be the positive class, we can either provide "dog" or 2, as alphabetically, "dog" comes after "cat".

Used when calculating confusion matrix metrics and creating ROC curves.

N.B. Only affects the evaluation metrics.

N.B. Binomial models only.

metrics

List for enabling/disabling metrics.

E.g. list("RMSE" = FALSE) would remove RMSE from the results, and list("Accuracy" = TRUE) would add the regular accuracy metric to the classification results. Default values (TRUE/FALSE) will be used for the remaining metrics available.

Also accepts the string "all".

N.B. Currently, disabled metrics are still computed.

include_predictions

Whether to include the predictions in the output as a nested tibble. (Logical)

parallel

Whether to run evaluations in parallel, when data is grouped with group_by.

Details

Packages used:

Gaussian:

r2m : MuMIn::r.squaredGLMM

r2c : MuMIn::r.squaredGLMM

AIC : stats::AIC

AICc : AICcmodavg::AICc

BIC : stats::BIC

Binomial and Multinomial:

Confusion matrix and related metrics: caret::confusionMatrix

ROC and related metrics: pROC::roc

MCC: mltools::mcc

----------------------------------------------------------------

Gaussian Results

----------------------------------------------------------------

Single tibble containing the following metrics by default:

Average RMSE, MAE, r2m, r2c, AIC, AICc, and BIC.

N.B. Some of the metrics will only be returned if model objects were passed, and NA if they could not be extracted from the passed model objects.

Also includes:

A nested tibble with the Predictions and targets

A nested tibble with the model Coefficients.

----------------------------------------------------------------

Binomial Results

----------------------------------------------------------------

A single tibble with the following evaluation metrics, based on a confusion matrix and a ROC curve fitted to the predictions:

ROC:

AUC, Lower CI, and Upper CI

Confusion Matrix:

Balanced Accuracy, F1, Sensitivity, Specificity, Positive Prediction Value, Negative Prediction Value, Kappa, Detection Rate, Detection Prevalence, Prevalence, and MCC (Matthews correlation coefficient).

Other available metrics (disabled by default, see metrics): Accuracy.

Also includes:

A nested tibble with the predictions and targets.

A nested tibble with the sensativities and specificities from the ROC curve.

A nested tibble with the confusion matrix. The Pos_ columns tells you whether a row is a True Positive (TP), True Negative (TN), False Positive (FP), or False Negative (FN), depending on which level is the "positive" class. I.e. the level you wish to predict.

----------------------------------------------------------------

Multinomial Results

----------------------------------------------------------------

A list with two tibbles:

Class Level Results

The Class Level Results tibble contains the results of the one-vs-all binomial evaluations. It contains the same metrics as the binomial results described above.

Also includes:

A nested tibble with the Predictions and targets used for the one-vs-all evaluation.

A nested tibble with the sensativities and specificities from the ROC curve.

A nested tibble with the Confusion Matrix from the one-vs-all evaluation. The Pos_ columns tells you whether a row is a True Positive (TP), True Negative (TN), False Positive (FP), or False Negative (FN), depending on which level is the "positive" class. In our case, 1 is the current class and 0 represents all the other classes together.

Results

The Results tibble contains the overall/macro metrics. The metrics that share their name with the metrics in the Class Level Results tibble are averages of those metrics (note: does not remove NAs before averaging). In addition to these, it also includes the Overall Accuracy metric and the Support metric, which is simply a count of the class in the target column.

Other available metrics (disabled by default, see metrics): Accuracy, Weighted Balanced Accuracy, Weighted Accuracy, Weighted F1, Weighted Sensitivity, Weighted Sensitivity, Weighted Specificity, Weighted Pos Pred Value, Weighted Neg Pred Value, Weighted AUC, Weighted Lower CI, Weighted Upper CI, Weighted Kappa, Weighted MCC, Weighted Detection Rate, Weighted Detection Prevalence, and Weighted Prevalence.

Note that the "Weighted" metrics are weighted averages, weighted by the Support.

Also includes:

A nested tibble with the Predictions and targets.

A nested tibble with the multiclass Confusion Matrix.

Examples

Run this code

# NOT RUN {
# Attach packages
library(cvms)
library(dplyr)

# Load data
data <- participant.scores

# Fit models
gaussian_model <- lm(age ~ diagnosis, data = data)
binomial_model <- glm(diagnosis ~ score, data = data)

# Add predictions
data[["gaussian_predictions"]] <- predict(gaussian_model, data,
                                          type = "response",
                                          allow.new.levels = TRUE)
data[["binomial_predictions"]] <- predict(binomial_model, data,
                                          allow.new.levels = TRUE)

# Gaussian evaluation
evaluate(data = data, target_col = "age",
         prediction_cols = "gaussian_predictions",
         models = list(gaussian_model),
         type = "gaussian")

# Binomial evaluation
evaluate(data = data, target_col = "diagnosis",
         prediction_cols = "binomial_predictions",
         type = "binomial")

# Multinomial

# Create a dataset
data_mc <- multiclass_probability_tibble(
    num_classes = 3, num_observations = 30,
    apply_softmax = TRUE, FUN = runif,
    class_name = "class_")

# Add targets
class_names <- paste0("class_", c(1,2,3))
data_mc[["target"]] <- sample(x = class_names,
                              size = 30, replace = TRUE)

# Multinomial evaluation
evaluate(data = data_mc, target_col = "target",
         prediction_cols = class_names,
         type = "multinomial")

# ID evaluation

# Gaussian ID evaluation
# Note that 'age' is the same for all observations
# of a participant
evaluate(data = data, target_col = "age",
         prediction_cols = "gaussian_predictions",
         id_col = "participant",
         type = "gaussian")

# Binomial ID evaluation
evaluate(data = data, target_col = "diagnosis",
         prediction_cols = "binomial_predictions",
         id_col = "participant",
         id_method = "mean", # alternatively: "majority"
         type = "binomial")

# Multinomial ID evaluation

# Add IDs and new targets (must be constant within IDs)
data_mc[["target"]] <- NULL
data_mc[["id"]] <- rep(1:6, each = 5)
id_classes <- tibble::tibble(
    "id" = 1:6,
    target = sample(x = class_names, size = 6, replace = TRUE)
)
data_mc <- data_mc %>%
    dplyr::left_join(id_classes, by = "id")

# Perform ID evaluation
evaluate(data = data_mc, target_col = "target",
         prediction_cols = class_names,
         id_col = "id",
         id_method = "mean", # alternatively: "majority"
         type = "multinomial")

# Training and evaluating a multinomial model with nnet

# Create a data frame with some predictors and a target column
class_names <- paste0("class_", 1:4)
data_for_nnet <- multiclass_probability_tibble(
    num_classes = 3, # Here, number of predictors
    num_observations = 30,
    apply_softmax = FALSE,
    FUN = rnorm,
    class_name = "predictor_") %>%
    dplyr::mutate(class = sample(
        class_names,
        size = 30,
        replace = TRUE))

# Train multinomial model using the nnet package
mn_model <- nnet::multinom(
    "class ~ predictor_1 + predictor_2 + predictor_3",
    data = data_for_nnet)

# Predict the targets in the dataset
# (we would usually use a test set instead)
predictions <- predict(mn_model, data_for_nnet,
                       type = "probs") %>%
    dplyr::as_tibble()

# Add the targets
predictions[["target"]] <- data_for_nnet[["class"]]

# Evaluate predictions
evaluate(data = predictions, target_col = "target",
         prediction_cols = class_names,
         type = "multinomial")
# }

Run the code above in your browser using DataLab