confusion_matrix: Confusion Matrices (Contingency Tables)

Description

Construction of confusion matrices, accuracy, sensitivity, specificity, confidence intervals (Wilson's method and (optional bootstrapping)).

Usage

confusion_matrix(x, ...)
# S3 method for default
confusion_matrix(
  x,
  y,
  positive,
  boot = FALSE,
  boot_samples = 1000L,
  alpha = 0.05,
  ...
)
# S3 method for formula
confusion_matrix(
  formula,
  data = parent.frame(),
  positive,
  boot = FALSE,
  boot_samples = 1000L,
  alpha = 0.05,
  ...
)
is.confusion_matrix(x)
# S3 method for confusion_matrix
print(x, ...)

Arguments

prediction condition vector, a two level factor variable or a variable that can be converted to one.

...

not currently used

True Condition vector with the same possible values as x.

positive

the level of x and y which is the positive outcome. If missing the first level of factor(y) will be used as the positive level.

boot

boolean, should bootstrapped confidence intervals for the sensitivity and specificity be computed? Defaults to FALSE.

boot_samples

number of bootstrapping sample to generate, defaults to 1000L. Ignored if boot == FALSE.

alpha

100(1-alpha) sensitivity. Ignored if boot == FALSE.

formula

column (known) ~ row (test) for building the confusion matrix

data

environment containing the variables listed in the formula

Value

The sensitivity and specificity functions return numeric values. confusion_matrix returns a list with elements:

tab the confusion matrix,
stats a matrix of summary statistics and confidence intervals.

Details

Sensitivity and Specificity: For the sensitivity and specificity function we expect the 2-by-2 confusion matrix (contingency table) to be of the form:

		True	Condition
		+	-
Predicted Condition	+	TP	FP
Predicted Condition	-	FN	TN

where

FN: False Negative, and
FP: False Positive,
TN: True Negative,
TP: True Positive.

Recall:

sensitivity = TP / (TP + FN)
specificity = TN / (TN + FP)
positive predictive value (PPV) = TP / (TP + FP)
negative predictive value (NPV) = TN / (TN + FN)

Examples

Run this code

# NOT RUN {
################################################################################
## Example 1
test  <- c(rep(1, 53), rep(0, 47))
truth <- c(rep(1, 20), rep(0, 33), rep(1, 10), rep(0, 37))
con_mat <- confusion_matrix(test, truth, positive = "1")
str(con_mat)
con_mat

################################################################################
## Example 2: based on an example from the wikipedia page:
# https://en.wikipedia.org/wiki/Confusion_matrix

animals <-
  data.frame(Predicted = c(rep("Cat",    5 + 2 +  0),
                           rep("Dog",    3 + 3 +  2),
                           rep("Rabbit", 0 + 1 + 11)),
             Actual    = c(rep(c("Cat", "Dog", "Rabbit"), times = c(5, 2,  0)),
                           rep(c("Cat", "Dog", "Rabbit"), times = c(3, 3,  2)),
                           rep(c("Cat", "Dog", "Rabbit"), times = c(0, 1, 11))),
             stringsAsFactors = FALSE)

table(animals)

cats <- apply(animals, 1:2, function(x) ifelse(x == "Cat", "Cat", "Non-Cat"))

# Default calls, note the difference based on what is set as the 'positive'
# value.
confusion_matrix(cats[, "Predicted"], cats[, "Actual"], positive = "Cat")
confusion_matrix(cats[, "Predicted"], cats[, "Actual"], positive = "Non-Cat")

# Using a Formula
confusion_matrix(I(Actual == "Cat") ~ I(Predicted == "Cat"),
                 data = as.data.frame(animals),
                 positive = "TRUE")

################################################################################
## Example 3
russell <-
  data.frame(Pred  = c(rep(0, 2295), rep(0, 118), rep(1, 1529), rep(1, 229)),
             Truth = c(rep(0, 2295), rep(1, 118), rep(0, 1529), rep(1, 229)))

# The values for Sensitivity, Specificity, PPV, and NPV are dependent on the
# "positive" level.  By default, the first level of y is used.
confusion_matrix(x = russell$Pred, y = russell$Truth, positive = "0")
confusion_matrix(x = russell$Pred, y = russell$Truth, positive = "1")

# }

Run the code above in your browser using DataLab