Learn R Programming

wru (version 3.1.0)

modfuns: Internal model fitting functions

Description

These functions are intended for internal use only. Users should use the predict_race() interface rather any of these functions directly.

Usage

.predict_race_old(
  voter.file,
  census.surname = TRUE,
  surname.only = FALSE,
  surname.year = 2020,
  name.dictionaries = NULL,
  census.geo,
  census.key = Sys.getenv("CENSUS_API_KEY"),
  census.data = NULL,
  age = FALSE,
  sex = FALSE,
  year = "2020",
  party,
  retry = 3,
  impute.missing = TRUE,
  use.counties = FALSE
)

predict_race_new( voter.file, names.to.use, year = "2020", age = FALSE, sex = FALSE, census.geo = c("tract", "block", "block_group", "county", "place", "zcta"), census.key = Sys.getenv("CENSUS_API_KEY"), name.dictionaries, surname.only = FALSE, census.data = NULL, retry = 0, impute.missing = TRUE, skip_bad_geos = FALSE, census.surname = FALSE, use.counties = FALSE )

predict_race_me( voter.file, names.to.use, year = "2020", age = FALSE, sex = FALSE, census.geo = c("tract", "block", "block_group", "county", "place", "zcta"), census.key = Sys.getenv("CENSUS_API_KEY"), name.dictionaries, surname.only = FALSE, census.data = NULL, retry = 0, impute.missing = TRUE, census.surname = FALSE, use.counties = FALSE, race.init, ctrl )

predict_race_embedding( voter.file, names.to.use, year = "2020", age = FALSE, sex = FALSE, census.geo = c("tract", "block", "block_group", "county", "place", "zcta"), census.key = Sys.getenv("CENSUS_API_KEY"), name.dictionaries, surname.only = FALSE, census.data = NULL, retry = 0, impute.missing = TRUE, skip_bad_geos = FALSE, census.surname = FALSE, use.counties = FALSE, ebisg.model = "intfloat/multilingual-e5-large" )

Value

Output will be an object of class data.frame. It will consist of the original user-input voter.file with additional columns with predicted probabilities for each of the five major racial categories: pred.whi for White, pred.bla for Black, pred.his for Hispanic/Latino, pred.asi for Asian/Pacific Islander, and pred.oth for Other/Mixed.

Arguments

voter.file

See documentation in race_predict.

census.surname

See documentation in race_predict.

surname.only

See documentation in race_predict.

surname.year

See documentation in race_predict.

name.dictionaries

See documentation in race_predict.

census.geo

See documentation in race_predict.

census.key

A character object specifying user's Census API key. Required if census.geo is specified, because a valid Census API key is required to download Census geographic data.

If NULL, the default, attempts to find a census key stored in an environment variable named CENSUS_API_KEY.

census.data

See documentation in race_predict.

age

See documentation in race_predict.

sex

See documentation in race_predict.

year

See documentation in race_predict.

party

See documentation in race_predict.

retry

See documentation in race_predict.

impute.missing

See documentation in race_predict.

use.counties

A logical, defaulting to FALSE. Should census data be filtered by counties available in census.data?

names.to.use

See documentation in race_predict.

skip_bad_geos

See documentation in race_predict.

race.init

See documentation in race_predict.

ctrl

See control in documentation for predict_race().

ebisg.model

Character string (HuggingFace model ID) or named list specifying which embedding model to use when model = "eBISG". The only built-in option is "intfloat/multilingual-e5-large" (default; 1024-dim), which is keyed by its full HuggingFace ID for consistency with the Python side. To use a different sentence-transformer, pass a named list with elements transformer (HuggingFace model ID), dim (embedding dimension), and surname_mlp, firstname_mlp (paths to custom .pt checkpoints).

.predict_race_old

Original WRU race prediction function, implementing classical BISG with census-based surname dictionary.

.predict_race_new

New race prediction function, implementing classical BISG with augmented surname dictionary, as well as first and middle name information.

.predict_race_me

New race prediction function, implementing fBISG (i.e. measurement error correction, fully Bayesian model) with augmented surname dictionary, as well as first and middle name information.

.predict_race_embedding

eBISG race prediction function, which uses pre-trained text embeddings (E5-Large) to predict race probabilities for names not found in Census surname lists, rather than falling back to generic population-level priors.

Details

These functions fit different versions of WRU. .predict_race_old fits the original WRU model, also known as BISG with census-based surname dictionary. .predict_race_new fits a new version of BISG which uses a new, augmented surname dictionary, and can also accommodate the use of first and middle name information. Finally, .predict_race_me fits a fully Bayesian Improved Surname Geocoding model (fBISG), which fits a model with measurement-error correction of erroneous zeros in census tables, in addition to also accommodating the augmented surname dictionary, and the first and middle name dictionaries when making predictions.