Learn R Programming

multilink (version 0.1.1)

dup_data_small: Small Duplicate Dataset

Description

A dataset containing 96 simulated records from 3 files with no duplicate records in each file, subset from dup_data.

Usage

dup_data_small

Arguments

Format

A list with three elements:

records

A data.frame with the records, containing 7 fields, from all three files, in the format used for input to create_comparison_data.

file_sizes

The size of each file.

IDs

The true partition of the records, represented as an integer vector of arbitrary labels of length sum(file_sizes).

References

Serge Aleshin-Guendel & Mauricio Sadinle (2022). Multifile Partitioning for Record Linkage and Duplicate Detection. Journal of the American Statistical Association. [tools:::Rd_expr_doi("https://doi.org/10.1080/01621459.2021.2013242")][arXiv]

Examples

Run this code
data(dup_data_small)

# There are 96 entities represented in the records
length(unique(dup_data_small$IDs))

Run the code above in your browser using DataLab