Learn R Programming

genderizeR (version 2.0.0)

genderizeTrain: Training genderize function

Description

genderizeTrain predicts gender and checks different combinations of probability and count parameters.

Usage

genderizeTrain(x, y, givenNamesDB, probs, counts, parallel = FALSE,
  cores = NULL)

Arguments

x

A text vector that we want to genderize.

y

A text vector of true gender labels for x vector.

givenNamesDB

A dataset with gender data (could be an output of findGivenNames function).

probs

A numeric vector of different probability values. Used to subseting a givenNamesDB dataset.

counts

A numeric vector of different count values. Used to subseting a givenNamesDB dataset.

parallel

If TRUE it computes errors with the use of parallel package and available cores. Default is FALSE.

cores

A integer value for number of cores designated to parallel processing or NULL (default). If parallel argument is TRUE and cores is NULL, than the available number of cores will be detected automatically.

Value

A data frame with all combination of parameters and computed sets of prediction indicators for each combination:

errorCoded

classification error for predicted & unpredicted gender

errorCodedWithoutNA

classification error for predicted gender only

naCoded

proportion of items with manually codded gender and with unpredicted gender

errorGenderBias

net gender bias error

See Also

Implementation of parallel mclapply on Windows machines by Nathan VanHoudnos http://edustatistics.org/nathanvan/setup/mclapply.hack.R

Examples

Run this code
# NOT RUN {
x = c('Alex', 'Darrell', 'Kale', 'Lee', 'Robin', 'Terry', 'John', 'Tom')
y = c(rep('male',length(x)))

givenNamesDB = findGivenNames(x)
probs = seq(from =  0.5, to = 0.9, by = 0.1)
counts = c(1, 10)

genderizeTrain(x = x, y = y, 
               givenNamesDB = givenNamesDB, 
               probs = probs, counts = counts, 
               parallel = TRUE) 

#     prob count errorCoded errorCodedWithoutNA naCoded errorGenderBias
#  1:  0.5     1      0.125               0.125   0.000           0.125
#  2:  0.6     1      0.125               0.000   0.125           0.000
#  3:  0.7     1      0.125               0.000   0.125           0.000
#  4:  0.8     1      0.375               0.000   0.375           0.000
#  5:  0.9     1      0.500               0.000   0.500           0.000
#  6:  0.5    10      0.125               0.125   0.000           0.125
#  7:  0.6    10      0.125               0.000   0.125           0.000
#  8:  0.7    10      0.125               0.000   0.125           0.000
#  9:  0.8    10      0.375               0.000   0.375           0.000
# 10:  0.9    10      0.500               0.000   0.500           0.000

# }
# NOT RUN {
# }

Run the code above in your browser using DataLab