Learn R Programming

gender (version 0.4.2)

gender: Gender: predict gender by name from historical data

Description

Gender: predict gender from names using historical data

This function predicts the gender of a first name given a year or range of years in which the person was born. The prediction can use one of several data sets suitable for different time periods or geographical regions. See the package vignette for suggestions on using this function with multiple names and for a discussion of which data set is most suitable for your research question. When using certains methods, the genderdata data package is required; you will be prompted to install it if it is not already available.

Usage

gender(name, years = c(1932, 2012), method = "ssa", certainty = TRUE)

gender(name, years = c(1932, 2012), method = "ssa", certainty = TRUE)

Arguments

name
A first name as a character vector. Names are case insensitive.
years
The birth year of the name whose gender is to be predicted. This argument can be either a single year, a range of years in the form c(1880, 1900). If no value is specified, then for the ssa method it will use the period 1932 to 2
method
This value determines the data set that is used to predict the gender of the name. The "ssa" method looks up names based from the U.S. Social Security Administration baby name data. (This method is based on an implementation by Cameron Blevin
certainty
A boolean value, which determines whether or not to return the proportion of male and female uses of names in addition to determining the gender of names.

Value

  • Returns a list containing the results of predicting the gender. Passing multiple names to the function results in a list of lists. The exact components of the returned list will depend on the specific method used. They include the following:
  • nameThe name for which the gender has been predicted.
  • proportion_maleThe proportion of male names for the given range of years.
  • proportion_femaleThe proportion of female names for the given range of years.
  • genderThe predicted gender based on the proportion of male and female names. Possible values are "male" and "female" for proportions above 0.5, "either" for proportions that are exactly 0.5, and NA for combinations of names and years for which a gender cannot be predicted using the given method.
  • year_minThe lower bound (inclusive) of the year range used for the prediction.
  • year_maxThe upper bound (inclusive) of the year range used for the prediction.

Details

Encodes gender based on names and dates of birth, using U.S. Census or Social Security data sets. Requires separate downlaod of datasets, which should be done automatically and can be done manually by running install_genderdata_package().

Examples

Run this code
gender("madison", method = "demo", years = 1985)
gender("madison", method = "demo", years = c(1900, 1985))
# SSA method
gender("madison", method = "demo", years = c(1900, 1985))
# IPUMS method
gender("madison", method = "ipums", years = 1860)

Run the code above in your browser using DataLab