genderizeR (version 1.0.0)

genderizeBootstrapError: Gender prediction errors on bootstrap samples

Description

genderizeBootstrapError calculates the Apparent Error Rate, the Leave-One-Out bootstrap error rate and the .632+ error rate from Efron and Tibishirani (1997). The code is modified version of several functions from sortinghat package by John A.Ramey.

Usage

genderizeBootstrapError(x, y, givenNamesDB, probs, counts,
  num_bootstraps = 50, parallel = FALSE)

Arguments

x
A text vector that we want to genderize
y
A text vector of true gender labels for x vector
givenNamesDB
A dataset with gender data (could be an output of findGivenNames function)
probs
A numeric vector of different probability values. Used to subseting a givenNamesDB dataset
counts
A numeric vector of different count values. Used to subseting a givenNamesDB dataset
num_bootstraps
Number of bootstrap samples. Default is 50.
parallel
It is passed to genderizeTrain function. If TRUE it computes errors with the use of parallel package and available cores. It is design to work on windows machines. Default is FALSE.

Value

  • A list of bootstrap errors:
  • apparentApparent Error Rate
  • loo_bootLOO-Boot Error Rate
  • errorRate632plus.632+ Error Rate

See Also

In the sortinghat package: errorest_apparent errorest_loo_boot errorest_632plus

Examples

Run this code
x = c('Alex', 'Darrell', 'Kale', 'Lee', 'Robin', 'Terry', rep('Robin', 20))
y = c('female', 'female', 'female', 'female', 'female', 'female', rep('male', 20))
givenNamesDB = findGivenNames(x)
classificatonErrors(labels = y,predictions = y)
probs = seq(from =  0.5, to = 0.9, by = 0.05)
counts = c(1)
set.seed(23)
genderizeBootstrapError(x = x, y = y, givenNamesDB = givenNamesDB,
probs = probs, counts = counts, num_bootstraps = 20, parallel = TRUE)
$apparent
[1] 0.9230769
$loo_boot
[1] 0.9401709
$errorRate632plus
[1] 0.9336006

Run the code above in your browser using DataLab