
Last chance! 50% off unlimited learning
Sale ends in
genderizeBootstrapError
calculates the Apparent Error Rate,
the Leave-One-Out bootstrap error rate,
and the .632+ error rate from Efron and Tibishirani (1997).
The code is modified version of several functions from sortinghat
package by John A. Ramey.
genderizeBootstrapError(x, y, givenNamesDB, probs, counts,
num_bootstraps = 50, parallel = FALSE)
A text vector that we want to genderize
A text vector of true gender labels ('female' or 'male') for x vector
A dataset with gender data (could be an output of findGivenNames
function)
A numeric vector of different probability values. Used to subseting a givenNamesDB dataset
A numeric vector of different count values. Used to subseting a givenNamesDB dataset
Number of bootstrap samples. Default is 50.
It is passed to genderizeTrain
function. If TRUE it computes errors with the use of parallel
package and available cores. Default is FALSE.
A list of bootstrap errors:
Apparent Error Rate
LOO-Boot Error Rate
.632+ Error Rate
In the sortinghat
package.
# NOT RUN {
x <- c('Alex', 'Darrell', 'Kale', 'Lee', 'Robin', 'Terry', rep('Robin', 20))
y <- c(rep('female', 6), rep('male', 20))
givenNamesDB = findGivenNames(x)
pred = genderize(x, givenNamesDB)
classificationErrors(labels = y, predictions = pred$gender)
probs = seq(from = 0.5, to = 0.9, by = 0.05)
counts = c(1)
set.seed(23)
genderizeBootstrapError(x = x, y = y,
givenNamesDB = givenNamesDB,
probs = probs, counts = counts,
num_bootstraps = 20,
parallel = TRUE)
# $apparent
# [1] 0.9615385
# $loo_boot
# [1] 0.965812
# $errorRate632plus
# [1] 0.964225
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab