yardstick (version 0.0.1)

conf_mat: Confusion Matrix for Categorical Data

Description

Calculates a cross-tabulation of observed and predicted classes.

For conf_mat() objects, the tidy method collapses the cell counts by cell into a data frame for each manipulation.

Usage

conf_mat(data, ...)

# S3 method for data.frame conf_mat(data, truth, estimate, dnn = c("Prediction", "Truth"), ...)

# S3 method for table conf_mat(data, ...)

# S3 method for conf_mat tidy(x, ...)

Arguments

data

A data frame or a base::table().

...

Options to pass to base::table() (not including dnn). This argument is not currently used for the tidy method.

truth

The column identifier for the true class results (that is a factor). This should an unquoted column name although this argument is passed by expression and support quasiquotation (you can unquote column names or column positions).

estimate

The column identifier for the predicted class results (that is also factor). As with truth this can be specified different ways but the primary method is to use an unquoted variable name.

dnn

a character vector of dimnames for the table

x

A object of class conf_mat().

Value

conf_mat produces a object with class conf_mat. This contains the table and other objects. tidy.conf_mat generates a tibble with columns name (the cell identifier) and value (the cell count).

Details

The function requires that the factors have exactly the same levels.

Examples

Run this code
# NOT RUN {
library(dplyr)
data("hpc_cv")

# The confusion matrix from a single assessment set (i.e. fold)
hpc_cv %>%
  filter(Resample == "Fold01") %>%
  conf_mat(obs, pred)

# Now compute the average confusion matrix across all folds in
# terms of the proportion of the data contained in each cell. 
# First get the raw cell counts per fold using the `tidy` method
cells_per_resample <- hpc_cv %>%
  group_by(Resample) %>%
  do(tidy(conf_mat(., obs, pred)))

# Get the totals per resample
counts_per_resample <- hpc_cv %>%
  group_by(Resample) %>%
  summarize(total = n()) %>%
  left_join(cells_per_resample, by = "Resample") %>%
  # Compute the proportions
  mutate(prop = value/total) %>%
  group_by(name) %>%
  # Average
  summarize(prop = mean(prop)) 

counts_per_resample

# Now reshape these into a matrix
mean_cmat <- matrix(counts_per_resample$prop, byrow = TRUE, ncol = 4)
rownames(mean_cmat) <- levels(hpc_cv$obs)
colnames(mean_cmat) <- levels(hpc_cv$obs)

round(mean_cmat, 3)
# }

Run the code above in your browser using DataLab