shannon.entropy.matrix: Shannon Entropy

Description

A generic S3 function to compute the shannon entropy score for a classification model. This function dispatches to S3 methods in shannon.entropy() and performs no input validation. If you supply NA values or vectors of unequal length (e.g. length(x) != length(y)), the underlying C++ code may trigger undefined behavior and crash your R session.

Defensive measures

Because shannon.entropy() operates on raw pointers, pointer-level faults (e.g. from NA or mismatched length) occur before any R-level error handling. Wrapping calls in try() or tryCatch() will not prevent R-session crashes.

To guard against this, wrap shannon.entropy() in a "safe" validator that checks for NA values and matching length, for example:

safe_shannon.entropy <- function(x, y, ...) {
  stopifnot(
    !anyNA(x), !anyNA(y),
    length(x) == length(y)
  )
  shannon.entropy(x, y, ...)
}

Apply the same pattern to any custom metric functions to ensure input sanity before calling the underlying C++ code.

Usage

# S3 method for matrix
shannon.entropy(pk, dim = 0L, normalize = FALSE, ...)

Value

A <double> value or vector:

A single <double> value (length 1) if dim == 0.
A <double> vector with length equal to the length of columns if dim == 1.
A <double> vector with length equal to the length of rows if dim == 2.

Arguments

pk: A \(n \times k\) <double>-matrix of observed probabilities. The \(i\)-th row should sum to 1 (i.e., a valid probability distribution over the \(k\) classes). The first column corresponds to the first factor level in actual, the second column to the second factor level, and so on.
dim: An <integer> value of length 1 (Default: 0). Defines the dimension along which to calculate the entropy (0: total, 1: row-wise, 2: column-wise).
normalize: A <logical>-value (default: TRUE). If TRUE, the mean cross-entropy across all observations is returned; otherwise, the sum of cross-entropies is returned.
...: Arguments passed into other methods.

References

MacKay, David JC. Information theory, inference and learning algorithms. Cambridge university press, 2003.

Kramer, Oliver, and Oliver Kramer. "Scikit-learn." Machine learning for evolution strategies (2016): 45-53.

Virtanen, Pauli, et al. "SciPy 1.0: fundamental algorithms for scientific computing in Python." Nature methods 17.3 (2020): 261-272.

Other Classification: accuracy(), auc.pr.curve(), auc.roc.curve(), baccuracy(), brier.score(), ckappa(), cmatrix(), cross.entropy(), dor(), fbeta(), fdr(), fer(), fmi(), fpr(), hammingloss(), jaccard(), logloss(), mcc(), nlr(), npv(), plr(), pr.curve(), precision(), recall(), relative.entropy(), roc.curve(), specificity(), zerooneloss()

Other Supervised Learning: accuracy(), auc.pr.curve(), auc.roc.curve(), baccuracy(), brier.score(), ccc(), ckappa(), cmatrix(), cross.entropy(), deviance.gamma(), deviance.poisson(), deviance.tweedie(), dor(), fbeta(), fdr(), fer(), fmi(), fpr(), gmse(), hammingloss(), huberloss(), jaccard(), logloss(), maape(), mae(), mape(), mcc(), mpe(), mse(), nlr(), npv(), pinball(), plr(), pr.curve(), precision(), rae(), recall(), relative.entropy(), rmse(), rmsle(), roc.curve(), rrmse(), rrse(), rsq(), smape(), specificity(), zerooneloss()

Other Entropy: cross.entropy(), logloss(), relative.entropy()

Examples

Run this code

## generate valid probability
## distributions
rand.sum <- function(n) {
   x <- sort(runif( n-1 ))
   c(x,1) - c(0, x)
}



## empirical and
## predicted probabilites
set.seed(1903)
pk <- t(replicate(200,rand.sum(5)))

## entropy
SLmetrics::shannon.entropy(
 pk = pk
)

Run the code above in your browser using DataLab