no_dup_data

A dataset containing <code>730</code> simulated records from <code>3</code> files with
no duplicate records in each file.

datasets

Implementation of the methodology of Aleshin-Guendel & Sadinle (2022) <doi:10.1080/01621459.2021.2013242>. It handles the general problem of multifile record linkage and duplicate detection, where any number of files are to be linked, and any of the files may have duplicates.

Serge Aleshin-Guendel

multilink

Multifile Record Linkage and Duplicate Detection

no_dup_data function

A list with three elements:<dl>
 <dt>records</dt>
<dd>A <code>data.frame</code> with the records, containing <code>7</code>
 fields, from all three files, in the format used for input to
 <code>create_comparison_data</code>.</dd> <dt>file_sizes</dt>
<dd>The size of each file.</dd> <dt>IDs</dt>
<dd>The true partition of the records, represented as an
 <code>integer</code> vector of arbitrary labels of length
 <code>sum(file_sizes)</code>.</dd>
</dl>

Format

No Duplicate Dataset — no_dup_data

A list with three elements:<dl>
 <dt>records</dt>
<dd>A <code>data.frame</code> with the records, containing <code>7</code>
 fields, from all three files, in the format used for input to
 <code>create_comparison_data</code>.</dd>

 <dt>file_sizes</dt>
<dd>The size of each file.</dd>

 <dt>IDs</dt>
<dd>The true partition of the records, represented as an
 <code>integer</code> vector of arbitrary labels of length
 <code>sum(file_sizes)</code>.</dd>


</dl>

no_dup_data: No Duplicate Dataset

Description

Usage

Arguments

Format

References

Examples