Learn R Programming

SLmetrics (version 0.3-4)

auc.roc.curve: Area under the Receiver Operator Characteristics Curve

Description

A generic S3 function to compute the area under the receiver operator characteristics curve score for a classification model. This function dispatches to S3 methods in auc.roc.curve() and performs no input validation. If you supply NA values or vectors of unequal length (e.g. length(x) != length(y)), the underlying C++ code may trigger undefined behavior and crash your R session.

Defensive measures

Because auc.roc.curve() operates on raw pointers, pointer-level faults (e.g. from NA or mismatched length) occur before any R-level error handling. Wrapping calls in try() or tryCatch() will not prevent R-session crashes.

To guard against this, wrap auc.roc.curve() in a "safe" validator that checks for NA values and matching length, for example:

safe_auc.roc.curve <- function(x, y, ...) {
  stopifnot(
    !anyNA(x), !anyNA(y),
    length(x) == length(y)
  )
  auc.roc.curve(x, y, ...)
}

Apply the same pattern to any custom metric functions to ensure input sanity before calling the underlying C++ code.

Visualizing area under the receiver operator characteristics curve

Use roc.curve() to construct the data.frame and use plot to visualize the area under the curve.

Efficient multi-metric evaluation

To avoid sorting the same probability matrix multiple times (once per class or curve), you can precompute a single set of sort indices and pass it via the indices argument. This reduces the overall cost from O(K·N log N) to O(N log N + K·N).

## presort response
## probabilities
indices <- preorder(response, decreasing = TRUE)

## evaluate area under the receiver operator characteristics curve auc.roc.curve(actual, response, indices = indices)

Usage

## Generic S3 method
## for Area under the Receiver Operator Characteristics Curve
auc.roc.curve(...)

## Generic S3 method for ## unweighted area under the ## Receiver Operator Characteristics ## Curve auc.roc.curve(...)

## Generic S3 method ## for weighted Area under the Receiver Operator Characteristics Curve weighted.auc.roc.curve(...)

Value

If estimator is given as

  • 0: a named <double>-vector of length k

  • 1: a <double> value (Micro averaged metric)

  • 2: a <double> value (Macro averaged metric)

Arguments

...

Arguments passed on to auc.roc.curve.factor, weighted.auc.roc.curve.factor

actual

A vector length \(n\), and \(k\) levels. Can be of integer or factor.

response

A \(n \times k\) <double>-matrix of predicted probabilities. The \(i\)-th row should sum to 1 (i.e., a valid probability distribution over the \(k\) classes). The first column corresponds to the first factor level in actual, the second column to the second factor level, and so on.

method

A <double> value (default: \(0\)). Defines the underlying method of calculating the area under the curve. If \(0\) it is calculated using the trapezoid-method, if \(1\) it is calculated using the step-method.

indices

An optional \(n \times k\) matrix of <integer> values of sorted response probability indices.

estimator

An <integer>-value of length \(1\) (default: \(0\)).

  • 0 - a named <double>-vector of length k (class-wise)

  • 1 - a <double> value (Micro averaged metric)

  • 2 - a <double> value (Macro averaged metric)

w

A <double> vector of sample weights.

References

James, Gareth, et al. An introduction to statistical learning. Vol. 112. No. 1. New York: springer, 2013.

Hastie, Trevor. "The elements of statistical learning: data mining, inference, and prediction." (2009).

Pedregosa, Fabian, et al. "Scikit-learn: Machine learning in Python." the Journal of machine Learning research 12 (2011): 2825-2830.

See Also

Other Classification: accuracy(), auc.pr.curve(), baccuracy(), brier.score(), ckappa(), cmatrix(), cross.entropy(), dor(), fbeta(), fdr(), fer(), fmi(), fpr(), hammingloss(), jaccard(), logloss(), mcc(), nlr(), npv(), plr(), pr.curve(), precision(), recall(), relative.entropy(), roc.curve(), shannon.entropy(), specificity(), zerooneloss()

Other Supervised Learning: accuracy(), auc.pr.curve(), baccuracy(), brier.score(), ccc(), ckappa(), cmatrix(), cross.entropy(), deviance.gamma(), deviance.poisson(), deviance.tweedie(), dor(), fbeta(), fdr(), fer(), fmi(), fpr(), gmse(), hammingloss(), huberloss(), jaccard(), logloss(), maape(), mae(), mape(), mcc(), mpe(), mse(), nlr(), npv(), pinball(), plr(), pr.curve(), precision(), rae(), recall(), relative.entropy(), rmse(), rmsle(), roc.curve(), rrmse(), rrse(), rsq(), shannon.entropy(), smape(), specificity(), zerooneloss()

Examples

Run this code
## Classes and
## seed
set.seed(1903)
classes <- c("Kebab", "Falafel")

## Generate actual classes
## and response probabilities
actual_classes <- factor(
    x = sample(
      x = classes, 
      size = 1e2, 
      replace = TRUE, 
      prob = c(0.7, 0.3)
    )
)

response_probabilities <- ifelse(
    actual_classes == "Kebab", 
    rbeta(sum(actual_classes == "Kebab"), 2, 5), 
    rbeta(sum(actual_classes == "Falafel"), 5, 2)
)

## Construct response
## matrix
probability_matrix <- cbind(
    response_probabilities,
    1 - response_probabilities
)

## Calculate area under the receiver operator characteristics curve

SLmetrics::auc.roc.curve(
    actual   = actual_classes, 
    response = probability_matrix
)

Run the code above in your browser using DataLab