genderizeBootstrapError: Gender prediction errors on bootstrap samples

Description

genderizeBootstrapError calculates the Apparent Error Rate, the Leave-One-Out bootstrap error rate and the .632+ error rate from Efron and Tibishirani (1997). The code is modified version of several functions from sortinghat package by John A.Ramey.

Usage

genderizeBootstrapError(x, y, givenNamesDB, probs, counts,
  num_bootstraps = 50, parallel = FALSE)

Arguments

A text vector that we want to genderize

A text vector of true gender labels for x vector

givenNamesDB

A dataset with gender data (could be an output of findGivenNames function)

probs

A numeric vector of different probability values. Used to subseting a givenNamesDB dataset

counts

A numeric vector of different count values. Used to subseting a givenNamesDB dataset

num_bootstraps

Number of bootstrap samples. Default is 50.

parallel

It is passed to genderizeTrain function. If TRUE it computes errors with the use of parallel package and available cores. It is design to work on windows machines. Default is FALSE.

Value

A list of bootstrap errors:
apparentApparent Error Rate
loo_bootLOO-Boot Error Rate
errorRate632plus.632+ Error Rate

Examples

Run this code

x = c('Alex', 'Darrell', 'Kale', 'Lee', 'Robin', 'Terry', rep('Robin', 20))
y = c('female', 'female', 'female', 'female', 'female', 'female', rep('male', 20))
givenNamesDB = findGivenNames(x)
classificatonErrors(labels = y,predictions = y)
probs = seq(from =  0.5, to = 0.9, by = 0.05)
counts = c(1)
set.seed(23)
genderizeBootstrapError(x = x, y = y, givenNamesDB = givenNamesDB,
probs = probs, counts = counts, num_bootstraps = 20, parallel = TRUE)
$apparent
[1] 0.9230769
$loo_boot
[1] 0.9401709
$errorRate632plus
[1] 0.9336006

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples