genderizeBootstrapError: Gender prediction errors on bootstrap samples

Description

genderizeBootstrapError calculates the Apparent Error Rate, the Leave-One-Out bootstrap error rate, and the .632+ error rate from Efron and Tibishirani (1997). The code is modified version of several functions from sortinghat package by John A. Ramey.

Usage

genderizeBootstrapError(x, y, givenNamesDB, probs, counts,
  num_bootstraps = 50, parallel = FALSE)

Arguments

A text vector that we want to genderize

A text vector of true gender labels ('female' or 'male') for x vector

givenNamesDB

A dataset with gender data (could be an output of findGivenNames function)

probs

A numeric vector of different probability values. Used to subseting a givenNamesDB dataset

counts

A numeric vector of different count values. Used to subseting a givenNamesDB dataset

num_bootstraps

Number of bootstrap samples. Default is 50.

parallel

It is passed to genderizeTrain function. If TRUE it computes errors with the use of parallel package and available cores. Default is FALSE.

Value

A list of bootstrap errors:

apparent

Apparent Error Rate

loo_boot

LOO-Boot Error Rate

errorRate632plus

.632+ Error Rate

Examples

Run this code

# NOT RUN {
x <- c('Alex', 'Darrell', 'Kale', 'Lee', 'Robin', 'Terry', rep('Robin', 20))

y <- c(rep('female', 6), rep('male', 20))

givenNamesDB = findGivenNames(x)
pred = genderize(x, givenNamesDB)
classificationErrors(labels = y, predictions = pred$gender)

probs = seq(from =  0.5, to = 0.9, by = 0.05)
counts = c(1)

set.seed(23)
genderizeBootstrapError(x = x, y = y, 
                         givenNamesDB = givenNamesDB, 
                         probs = probs, counts = counts, 
                         num_bootstraps = 20, 
                         parallel = TRUE)


# $apparent
# [1] 0.9615385

# $loo_boot
# [1] 0.965812

# $errorRate632plus
# [1] 0.964225


# }
# NOT RUN {
# }

Run the code above in your browser using DataLab