Learn R Programming

multilink (version 0.1.1)

no_dup_data: No Duplicate Dataset

Description

A dataset containing 730 simulated records from 3 files with no duplicate records in each file.

Usage

no_dup_data

Arguments

Format

A list with three elements:

records

A data.frame with the records, containing 7 fields, from all three files, in the format used for input to create_comparison_data.

file_sizes

The size of each file.

IDs

The true partition of the records, represented as an integer vector of arbitrary labels of length sum(file_sizes).

References

Serge Aleshin-Guendel & Mauricio Sadinle (2022). Multifile Partitioning for Record Linkage and Duplicate Detection. Journal of the American Statistical Association. [tools:::Rd_expr_doi("https://doi.org/10.1080/01621459.2021.2013242")] [arXiv]

Examples

Run this code
data(no_dup_data)

# There are 500 entities represented in the records
length(unique(no_dup_data$IDs))

Run the code above in your browser using DataLab