mean_squared_error: Accuracy Measures for Ordered Probability Predictions

Description

Accuracy measures for evaluating ordered probability predictions.

Usage

mean_squared_error(y, predictions, use.true = FALSE)
mean_ranked_score(y, predictions, use.true = FALSE)
classification_error(y, predictions)

Value

The MSE, the RPS, or the classification error of the method.

Arguments

y: Either the observed outcome vector or a matrix of true probabilities.
predictions: Predictions.
use.true: If TRUE, then the program treats y as a matrix of true probabilities.

Author

Riccardo Di Francesco

Details

MSE and RPS

When calling mean_squared_error or mean_ranked_score, predictions must be a matrix of predicted class probabilities, with as many rows as observations in y and as many columns as classes of y.

If use.true == FALSE, the mean squared error (MSE) and the mean ranked probability score (RPS) are computed as follows:

$$MSE = \frac{1}{n} \sum_{i = 1}^n \sum_{m = 1}^M (1 (Y_i = m) - \hat{p}_m (x))^2$$

$$RPS = \frac{1}{n} \sum_{i = 1}^n \frac{1}{M - 1} \sum_{m = 1}^M (1 (Y_i \leq m) - \hat{p}_m^* (x))^2$$

If use.true == TRUE, the MSE and the RPS are computed as follows (useful for simulation studies):

$$MSE = \frac{1}{n} \sum_{i = 1}^n \sum_{m = 1}^M (p_m (x) - \hat{p}_m (x))^2$$

$$RPS = \frac{1}{n} \sum_{i = 1}^n \frac{1}{M - 1} \sum_{m = 1}^M (p_m^* (x) - \hat{p}_m^* (x))^2$$

where:

$$p_m (x) = P(Y_i = m | X_i = x)$$

$$p_m^* (x) = P(Y_i \leq m | X_i = x)$$

Classification error

When calling classification_error, predictions must be a vector of predicted class labels.

Classification error is computed as follows:

$$CE = \frac{1}{n} \sum_{i = 1}^n 1 (Y_i \neq \hat{Y}_i)$$

where Y_i are the observed class labels.

Examples

Run this code

## Load data from orf package.
set.seed(1986)

library(orf)
data(odata)
odata <- odata[1:200, ] # Subset to reduce elapsed time.

y <- as.numeric(odata[, 1])
X <- as.matrix(odata[, -1])

## Training-test split.
train_idx <- sample(seq_len(length(y)), floor(length(y) * 0.5))

y_tr <- y[train_idx]
X_tr <- X[train_idx, ]

y_test <- y[-train_idx]
X_test <- X[-train_idx, ]

## Fit morf on training sample.
forests <- morf(y_tr, X_tr)

## Accuracy measures on test sample.
predictions <- predict(forests, X_test)

mean_squared_error(y_test, predictions$probabilities)
mean_ranked_score(y_test, predictions$probabilities)
classification_error(y_test, predictions$classification)

Run the code above in your browser using DataLab