rl_reg1: 500 records suitable for record linkage with additional regression variables

Description

Simulated datasets containing the name, birthdate, and additional attributes of 500 records, of which there are 350 unique individuals.

Usage

rl_reg1
rl_reg2
rl_reg5
identity.rl_reg1
identity.rl_reg2
identity.rl_reg5
linkage.rl

Arguments

Format

rl_reg1 and rl_reg5 are data frames with 500 rows and 9 columns. Each row represents 1 records with the following columns:

fname: First name
lname: Last name
bm: Birth month (numeric)
bd: Birth day
by: Birth year
sex: Sex ("M" or "F")
education: Education level ("Less than a high school diploma", ""High school graduates, no college", "Some college or associate degree", "Bachelor's degree only", or "Advanced degree")
income: Yearly income (in 1000s)
bp: Systolic blood pressure

identity.rl_reg1 and identity.rl_reg5 are integer vectors indicating the true record ids of the two datasets. Two records represent the same individual if and only if their corresponding identity values are equal.

linkage.rl contains the result of running 100,000 iterations of a record linkage model using the package blink.

An object of class data.frame with 500 rows and 9 columns.

An object of class integer of length 500.

An object of class matrix (inherits from array) with 100000 rows and 500 columns.

Details

There is a known relationship between three of the variables in the dataset, blood pressure (bp), income, and sex. $$bp = 160 + 10I(sex = "M") - income + 0.5 income*I(sex = "M") + \epsilon$$ where $\epsilon ~ Normal(0, \sigma^2)$ and $\sigma = 1, 2, 5$.

The 150 duplicated records have randomly generated errors.