Learn R Programming

yardstick (version 0.0.1)

roc_auc: Metrics Based on Class Probabilities

Description

These functions compute the areas under the receiver operating characteristic (ROC) curve (roc_auc), the precision-recall curve (pr_auc), or the multinomial log loss (mnLogLoss).

Usage

roc_auc(data, ...)

# S3 method for data.frame roc_auc(data, truth, ..., options = list(), na.rm = TRUE)

pr_auc(data, ...)

# S3 method for data.frame pr_auc(data, truth, ..., na.rm = TRUE)

mnLogLoss(data, ...)

# S3 method for data.frame mnLogLoss(data, truth, ..., na.rm = TRUE, sum = FALSE)

Arguments

data

A data frame with the relevant columns.

...

A set of unquoted column names or one or more dplyr selector functions to choose which variables contain the class probabilities. See the examples below. For roc_auc and pr_auc, only one value is required. If more are given, the functions will try to match the column name to the appropriate factor level of truth. If this doesn't work, an error is thrown. For mnLogLoss, there should be as many columns as factor levels of truth. It is assumed that they are in the same order as the factor levels.

truth

The column identifier for the true class results (that is a factor). This should an unquoted column name although this argument is passed by expression and support quasiquotation (you can unquote column names or column positions).

options

Options to pass to roc() such as direction or smooth. These options should not include response, predictor, or levels.

na.rm

A logical value indicating whether NA values should be stripped before the computation proceeds

sum

A logical. Should the sum of the likelihood contrinbutions be returned (instead of the mean value)?

Value

A number between 0 and 1 (or NA) for roc_auc or pr_auc. For mnLogLoss a number or NA.

Details

There is no common convention on which factor level should automatically be considered the "relevant" or "positive" results. In yardstick, the default is to use the first level. To change this, a global option called yardstick.event_first is set to TRUE when the package is loaded. This can be changed to FALSE if the last level of the factor is considered the level of interest.

See Also

conf_mat(), summary.conf_mat(), recall(), mcc()

Examples

Run this code
# NOT RUN {
library(tidyselect)

data("two_class_example")
prob_cols <- levels(two_class_example$truth)

roc_auc(two_class_example, truth = truth, Class1)
# warning is issued here because 2 columns are selected:
roc_auc(two_class_example, truth, starts_with("Class"))

# passing options via a list and _not_ `...`
roc_auc(two_class_example, truth = "truth", Class1,
        options = list(smooth = TRUE))
        
pr_auc(two_class_example, truth, prob_cols)    

mnLogLoss(two_class_example, truth, starts_with("Class"))
# or
mnLogLoss(two_class_example, truth, !! prob_cols)            
# }

Run the code above in your browser using DataLab