Last chance! 50% off unlimited learning
Sale ends in
merge_names
merges names in a user-input dataset with corresponding
race/ethnicity probabilities derived from both the U.S. Census Surname List
and Spanish Surname List and voter files from states in the Southern U.S.
merge_names(
voter.file,
namesToUse,
census.surname,
table.surnames = NULL,
table.first = NULL,
table.middle = NULL,
clean.names = TRUE,
impute.missing = FALSE,
model = "BISG"
)
Output will be an object of class data.frame
. It will
consist of the original user-input data with additional columns that
specify the part of the name matched with Census data (surname.match
),
and the probabilities Pr(Race | Surname) for each racial group
(p_whi
for White, p_bla
for Black,
p_his
for Hispanic/Latino,
p_asi
for Asian and Pacific Islander, and
p_oth
for Other/Mixed).
An object of class data.frame
. Must contain a row for each individual being predicted,
as well as a field named last
containing each individual's surname.
If first name is also being used for prediction, the file must also contain a field
named first
. If middle name is also being used for prediction, the field
must also contain a field named middle
.
A character vector identifying which names to use for the prediction.
The default value is "last"
, indicating that only the last name will be used.
Other options are "last, first"
, indicating that both last and first names will be
used, and "last, first, middle"
, indicating that last, first, and middle names will all
be used.
A TRUE
/FALSE
object. If TRUE
,
function will call merge_surnames
to merge in Pr(Race | Surname)
from U.S. Census Surname List (2000, 2010, or 2020) and Spanish Surname List.
If FALSE
, user must provide a name.dictionary
(see below).
Default is TRUE
.
An object of class data.frame
provided by the
users as an alternative surname dictionary. It will consist of a list of
U.S. surnames, along with the associated probabilities P(name | ethnicity)
for ethnicities: white, Black, Hispanic, Asian, and other. Default is NULL
.
(last_name
for U.S. surnames, p_whi_last
for White,
p_bla_last
for Black, p_his_last
for Hispanic,
p_asi_last
for Asian, p_oth_last
for other).
See table.surnames
.
See table.surnames
.
A TRUE
/FALSE
object. If TRUE
,
any surnames in voter.file
that cannot initially be matched
to the database will be cleaned, according to U.S. Census specifications,
in order to increase the chance of finding a match. Default is TRUE
.
See predict_race
.
See predict_race
.
This function allows users to match names in their dataset with database entries estimating P(name | ethnicity) for each of the five major racial groups for each name. The database probabilities are derived from both the U.S. Census Surname List and Spanish Surname List and voter files from states in the Southern U.S.
By default, the function matches names as follows:
Search raw surnames in the database;
Remove any punctuation and search again;
Remove any spaces and search again;
Remove suffixes (e.g., "Jr") and search again (last names only)
Split double-barreled names into two parts and search first part of name;
Split double-barreled names into two parts and search second part of name;
Each step only applies to names not matched in a previous step.
Steps 2 through 6 are not applied if clean.surname
is FALSE.
Note: Any name appearing only on the Spanish Surname List is assigned a probability of 1 for Hispanics/Latinos and 0 for all other racial groups.
data(voters)
if (FALSE) try(merge_names(voters, namesToUse = "surname", census.surname = TRUE))
Run the code above in your browser using DataLab