gender (version 0.5.4)

gender_df: Use gender prediction with data frames

Description

In a common use case for gender prediction, you have a data frame with a column for first names and a column for birth years (or, two columns specifying a minimum and a maximum potential birth year). This function wraps the gender function to efficiently apply it to such a data frame. The result is a data frame with one prediction of the gender for each unique combination of first name and birth year. The resulting data frame can then be merged back into your original data frame.

Usage

gender_df(
  data,
  name_col = "name",
  year_col = "year",
  method = c("ssa", "ipums", "napp", "demo")
)

Arguments

data

A data frame containing first names and birth year or range of potential birth years.

name_col

A string specifying the name of the column containing the first names.

year_col

Either a single string specifying the birth year associated with the first name, or character vector with two elements: the names of the columns with the minimum and maximum years for the range of potential birth years.

method

One of the historical methods provided by this package: "ssa", "ipums", "napp", or "demo". See gender for details.

Value

A data frame with columns from the output of the gender function, and one row for each unique combination of first names and birth years.

See Also

gender

Examples

Run this code
# NOT RUN {
library(dplyr)
demo_df <- tibble(names = c("Hillary", "Hillary", "Hillary",
                                "Madison", "Madison"),
                      birth_year = c(1930, 2000, 1930, 1930, 2000),
                      min_year = birth_year - 1,
                      max_year = birth_year + 1,
                      stringsAsFactors = FALSE)

# Using the birth year for the predictions.
# Notice that the duplicate value for Hillary in 1930 is removed
gender_df(demo_df, method = "demo",
          name_col = "names", year_col = "birth_year")

# Using a range of years
gender_df(demo_df, method = "demo",
          name_col = "names", year_col = c("min_year", "max_year"))
# }

Run the code above in your browser using DataCamp Workspace