gender v0.5.1

0

Monthly downloads

0th

Percentile

by Lincoln Mullen

Predict Gender from Names Using Historical Data

Encodes gender based on names and dates of birth using historical datasets. By using these datasets instead of lists of male and female names, this package is able to more accurately guess the gender of a name, and it is able to report the probability that a name was male or female.

Readme

CRAN\_Status\_Badge CRAN\_Downloads Build Status AppVeyor Build Status

gender: Predict Gender from Names Using Historical Data

Data sets, historical or otherwise, often contain a list of first names but seldom identify those names by gender. Most techniques for finding gender programmatically rely on lists of male and female names. However, the gender associated with names can vary over time. Any data set that covers the normal span of a human life will require a historical method to find gender from names. This R package uses historical datasets from the U.S. Social Security Administration, the U.S. Census Bureau (via IPUMS USA), and the North Atlantic Population Project to provide predictions of gender for first names for particular countries and time periods.

Installation

You can install this package from CRAN:

install.packages("gender")

The first time you use the package you will be prompted to install the accompanying genderdata package. Alternatively, you can install this package for yourself from the rOpenSci package repository:

install.packages("genderdata", type = "source",
                 repos = "http://packages.ropensci.org")

If you prefer, you can install the development versions of both packages from the rOpenSci package repository:

install.packages(c("gender", "genderdata"),
                 repos = "http://packages.ropensci.org",
                 type = "source")

Using the package

The gender() function takes a character vector of names and a year or range of years and uses various datasets to predict the gender of names. Here we predict the gender of the names Madison and Hillary in 1930 and again in the 2000s using Social Security data.

library(gender)
gender(c("Madison", "Hillary"), years = 1930, method = "ssa")
#> Source: local data frame [2 x 6]
#> 
#>      name proportion_male proportion_female gender year_min year_max
#>     (chr)           (dbl)             (dbl)  (chr)    (dbl)    (dbl)
#> 1 Hillary               1                 0   male     1930     1930
#> 2 Madison               1                 0   male     1930     1930
gender(c("Madison", "Hillary"), years = c(2000, 2010), method = "ssa")
#> Source: local data frame [2 x 6]
#> 
#>      name proportion_male proportion_female gender year_min year_max
#>     (chr)           (dbl)             (dbl)  (chr)    (dbl)    (dbl)
#> 1 Hillary          0.0055            0.9945 female     2000     2010
#> 2 Madison          0.0046            0.9954 female     2000     2010

See the package vignette or read it online at CRAN for a fuller introduction and suggestions on how to use the gender() function efficiently with large datasets.

vignette(topic = "predicting-gender", package = "gender")

To read the documentation for the datasets, install the genderdata package then examine the included datasets.

library(genderdata)
data(package = "genderdata")

Citation

If you use this package, I would appreciate a citation. You can see up to date citation information with citation("gender"). You can cite either the package or the accompanying journal article.

Lincoln Mullen (2015). gender: Predict Gender from Names Using Historical Data. R package version 0.5.0.9000. https://github.com/ropensci/gender

Cameron Blevins and Lincoln Mullen, "Jane, John ... Leslie? A Historical Method for Algorithmic Gender Prediction," Digital Humanities Quarterly (forthcoming 2015).


rOpenSCi logo

Functions in gender

Name Description
check_genderdata_package Check whether to install data for gender function and install if necessary
install_genderdata_package Install the genderdata package after checking with the user
gender Predict gender from first names using historical data
gender_df Use gender prediction with data frames
gender-package Gender: predict gender by name from historical data
No Results!

Last month downloads

Details

Type Package
Date 2015-09-03
URL https://github.com/ropensci/gender
Additional_repositories http://packages.ropensci.org
LazyData yes
License MIT + file LICENSE
VignetteBuilder knitr
BugReports https://github.com/ropensci/gender/issues
NeedsCompilation no
Packaged 2016-12-05 14:58:18 UTC; hornik
Repository CRAN
Date/Publication 2016-12-05 18:28:47

Include our badge in your README

[![Rdoc](http://www.rdocumentation.org/badges/version/gender)](http://www.rdocumentation.org/packages/gender)