Simulated datasets containing the name, birthdate, and additional attributes of 500 records, of which there are 350 unique individuals.
rl_reg1rl_reg2
rl_reg5
identity.rl_reg1
identity.rl_reg2
identity.rl_reg5
linkage.rl
rl_reg1
and rl_reg5
are data frames with 500 rows and 9 columns. Each row represents 1 records
with the following columns:
First name
Last name
Birth month (numeric)
Birth day
Birth year
Sex ("M" or "F")
Education level ("Less than a high school diploma", ""High school graduates, no college", "Some college or associate degree", "Bachelor's degree only", or "Advanced degree")
Yearly income (in 1000s)
Systolic blood pressure
identity.rl_reg1
and identity.rl_reg5
are integer vectors indicating the true
record ids of the two datasets. Two records represent the same individual if and only if their
corresponding identity values are equal.
linkage.rl
contains the result of running 100,000 iterations of a record linkage model using
the package blink
.
An object of class data.frame
with 500 rows and 9 columns.
An object of class data.frame
with 500 rows and 9 columns.
An object of class integer
of length 500.
An object of class integer
of length 500.
An object of class integer
of length 500.
An object of class matrix
(inherits from array
) with 100000 rows and 500 columns.
There is a known relationship between three of the variables in the dataset, blood pressure (bp), income, and sex. $$bp = 160 + 10I(sex = "M") - income + 0.5 income*I(sex = "M") + \epsilon$$ where \(\epsilon ~ Normal(0, \sigma^2)\) and \(\sigma = 1, 2, 5\).
The 150 duplicated records have randomly generated errors.