genderizeR (version 1.0.0)

genderizeTrain: Training genderize function

Description

genderizeTrain predicts gender and checks different combination of 'probability' and 'count' paramters.

Usage

genderizeTrain(x, y, givenNamesDB, probs, counts, parallel = FALSE)

Arguments

x
A text vector that we want to genderize
y
A text vector of true gender labels for x vector
givenNamesDB
A dataset with gender data (could be an output of findGivenNames function)
probs
A numeric vector of different probability values. Used to subseting a givenNamesDB dataset
counts
A numeric vector of different count values. Used to subseting a givenNamesDB dataset
parallel
If TRUE it computes errors with the use of parallel package and available cores. It is design to work on windows machines. Default is FALSE.

Value

  • A data frame with all combination of parameters and computed sets of prediction indicators for each combination:
  • errorCodedclassification error for predicted & unpredicted gender
  • errorCodedWithoutNAfor predicted gender only
  • naCodedproportion of items with manually codded gender and with unpredicted gender
  • errorGenderBiasnet gender bias error

See Also

Implementation of parallel mclapply on Windows machines by Nathan VanHoudnos http://www.stat.cmu.edu/~nmv/setup/mclapply.hack.R

Examples

Run this code
x = c('Alex', 'Darrell', 'Kale', 'Lee', 'Robin', 'Terry', 'John', 'Tom')
y = c(rep('male',length(x)))
givenNamesDB = findGivenNames(x)
probs = seq(from =  0.5, to = 0.9, by = 0.05)
counts = c(1, 10)
genderizeTrain(x = x, y = y, givenNamesDB = givenNamesDB,
probs = probs, counts = counts)

Run the code above in your browser using DataLab