Returns the area under the ROC curve based on comparing the predicted scores to the actual binary values. Tied predictions are handled by calculating the optimistic AUC (positive cases sorted first, resulting in higher AUC) and the pessimistic AUC (positive cases sorted last, resulting in lower AUC) and then returning the average of the two. For the ROC, a "tie" means at least one pair of pred
predictions whose value is identical yet their corresponding values of actual
are different. (If the value of actual
are the same for identical predictions, then these are unproblematic and are not considered "ties".)
aucroc(
actual,
pred,
na.rm = FALSE,
positive = NULL,
sample_size = 10000,
seed = 0
)
List with the following elements:
roc_opt
: tibble with optimistic ROC data. "Optimistic" means that when predictions are tied, the TRUE/positive actual values are ordered before the FALSE/negative ones.
roc_pess
: tibble with pessimistic ROC data. "Pessimistic" means that when predictions are tied, the FALSE/negative actual values are ordered before the TRUE/positive ones. Note that this difference is not merely in the sort order: when there are ties, the way that true positives, true negatives, etc. are counted is different for optimistic and pessimistic approaches. If there are no tied predictions, then roc_opt
and roc_pess
are identical.
auc_opt
: area under the ROC curve for optimistic ROC.
auc_pess
: area under the ROC curve for pessimistic ROC.
auc
: mean of auc_opt
and auc_pess
. If there are no tied predictions, then auc_opt
, auc_pess
, and auc
are identical.
ties
: TRUE
if there are two or more tied predictions; FALSE
if there are no ties.
any atomic vector. Actual label values from a dataset. They must be binary; that is, there must be exactly two distinct values (other than missing values, which are allowed). The "true" or "positive" class is determined by coercing actual
to logical TRUE
and FALSE
following the rules of as.logical()
. If this is not the intended meaning of "positive", then specify which of the two values should be considered TRUE
with the argument positive
.
numeric vector. Predictions corresponding to each respective element in actual
. Any numeric value (not only probabilities) are permissible.
logical(1). TRUE
if missing values should be removed; FALSE
if they should be retained. If TRUE
, then if any element of either actual
or pred
is missing, its paired element will be also removed.
any single atomic value. The value of actual
that is considered TRUE
; any other value of actual
is considered FALSE
. For example, if 2
means TRUE
and 1
means FALSE
, then set positive = 2
.
single positive integer. To keep the computation relatively rapid, when actual
and pred
are longer than sample_size
elements, then a random sample of sample_size
of actual
and pred
will be selected and the ROC and AUC will be calculated on this sample. To disable random sampling for long inputs, set sample_size = NA
.
numeric(1). Random seed used only if length(actual) > sample_size
.
set.seed(0)
# Generate some simulated "actual" data
a <- sample(c(TRUE, FALSE), 50, replace = TRUE)
# Generate some simulated predictions
p <- runif(50) |> round(2)
p[c(7, 8, 22, 35, 40, 41)] <- 0.5
# Calculate AUCROC with its components
ar <- aucroc(a, p)
ar$auc
Run the code above in your browser using DataLab