Learn R Programming

genderizeR (version 2.1.1)

classificationErrors: Calculating classification errors and other prediction indicators

Description

classificationErrors builds confusion matrix from manually coded and predicted gender vectors and returns classification errors calculated on that matrix.

Usage

classificationErrors(labels, predictions)

Arguments

labels

A vector of true labels. Should have following values: c("female", "male", "unknown", "noname"). noname stands also for initials only.

predictions

A vector of predicted gender. Should have following values: c("female", "male", NA). NA when it was not possible to predict a gender.

Value

A list of gender prediction efficiency indicators:

confMatrix

Full confusion matrix.

errorTotal

Total classification error calculated on the matrix.

errorFullFirstNames

Classification error calculated without "noname" category.

errorCoded

Classification error calculated without both "noname" and "unknown" category.

errorCodedWithoutNA

Classification error calculated only on "female" and "male" categories from both predictions and labels.

naTotal

Total proportion of items with unpredicted gender.

naFullFirstNames

Proportion of items with unpredicted gender calculated without "noname" category.

naCoded

Proportion of items with unpredicted gender calculated without both "noname" and "unknown" category.

errorGenderBias

Calculated as follows: "male" classified as "female" minus "female" classified as "male" and divided by the sum of items in "female" and "male" categories from both predictions and labels.

Examples

Run this code
# NOT RUN {
suppressWarnings(RNGversion("3.5.0"))
set.seed(23)
labels = sample(c("female", "male", "unknown", "noname"), 100, replace = TRUE)
predictions = sample(c("female", "male", NA), 100, replace = TRUE)
classificationErrors(labels, predictions)

# $confMatrix
#          predictions
# labels    female male <NA>
#   female       6    6    8
#   male         6   10   10
#   noname      12    6   17
#   unknown      5    7    7
#   <NA>         0    0    0
# 
# $errorTotal
# [1] 0.67
# 
# $errorFullFirstNames
# [1] 0.6461538
# 
# $errorCoded
# [1] 0.6521739
# 
# $errorCodedWithoutNA
# [1] 0.4285714
# 
# $naTotal
# [1] 0.42
# 
# $naFullFirstNames
# [1] 0.3846154
# 
# $naCoded
# [1] 0.3913043
# 
# $errorGenderBias
# [1] 0 

# }

Run the code above in your browser using DataLab