representr (version 0.1.1)

rl_reg1: 500 records suitable for record linkage with additional regression variables

Description

Simulated datasets containing the name, birthdate, and additional attributes of 500 records, of which there are 350 unique individuals.

Usage

rl_reg1

rl_reg2

rl_reg5

identity.rl_reg1

identity.rl_reg2

identity.rl_reg5

linkage.rl

Arguments

Format

rl_reg1 and rl_reg5 are data frames with 500 rows and 9 columns. Each row represents 1 records with the following columns:

fname

First name

lname

Last name

bm

Birth month (numeric)

bd

Birth day

by

Birth year

sex

Sex ("M" or "F")

education

Education level ("Less than a high school diploma", ""High school graduates, no college", "Some college or associate degree", "Bachelor's degree only", or "Advanced degree")

income

Yearly income (in 1000s)

bp

Systolic blood pressure

identity.rl_reg1 and identity.rl_reg5 are integer vectors indicating the true record ids of the two datasets. Two records represent the same individual if and only if their corresponding identity values are equal.

linkage.rl contains the result of running 100,000 iterations of a record linkage model using the package blink.

An object of class data.frame with 500 rows and 9 columns.

An object of class data.frame with 500 rows and 9 columns.

An object of class integer of length 500.

An object of class integer of length 500.

An object of class integer of length 500.

An object of class matrix (inherits from array) with 100000 rows and 500 columns.

Details

There is a known relationship between three of the variables in the dataset, blood pressure (bp), income, and sex. $$bp = 160 + 10I(sex = "M") - income + 0.5 income*I(sex = "M") + \epsilon$$ where \(\epsilon ~ Normal(0, \sigma^2)\) and \(\sigma = 1, 2, 5\).

The 150 duplicated records have randomly generated errors.