For each character string in x
vector genderize
use output of the
findGivenNames
function and returns
a gender prediction for the whole character string based
on possible first name terms located inside those strings.
genderize(x, genderDB, blacklist = NULL, progress = TRUE)
A vector of text strings.
A data.table output of findGivenNames
function
for the vector x.
Some terms could be excluded from gender checking
If TRUE (default) progress bar is displayed in the console
A data table with text string, a term found in genderDB
,
that is finally used as a given name to predict gender,
a predicted gender, number of potential gender indicators
("1" if only one term from the text string is found in genderDB
).
# NOT RUN {
x = c("Winston J. Durant, ASHP past president, dies at 84",
"Gold Badge of Honour of the DGAI Prof. Dr. med. Norbert R. Roewer Wuerzburg",
"The contribution of professor Yu.S. Martynov (1921-2008) to Russian neurology",
"JAN BASZKIEWICZ (3 JANUARY 1930 - 27 JANUARY 2011) IN MEMORIAM",
"Maria Sklodowska-Curie")
givenNames = findGivenNames(x)
givenNames = givenNames[count>40]
genderize(x, genderDB=givenNames, blacklist=NULL)
# text
# 1: Winston J. Durant, ASHP past president, dies at 84
# 2: Gold Badge of Honour of the DGAI Prof. Dr. med. Norbert R. Roewer Wuerzburg
# 3: The contribution of professor Yu.S. Martynov (1921-2008) to Russian neurology
# 4: JAN BASZKIEWICZ (3 JANUARY 1930 - 27 JANUARY 2011) IN MEMORIAM
# 5: Maria Sklodowska-Curie
# givenName gender genderIndicators
# 1: winston male 1
# 2: med male 2
# 3: NA NA 0
# 4: jan male 1
# 5: maria female 1
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab