BRL (version 0.1.0)

twoFiles: Two Datasets for Record Linkage

Description

Two data frames, df1 and df2, containing 300 and 150 records of artificially created individuals, where 50 of them are included in both datafiles. In addition, the vector df2ID contains one entry per record in df2 indicating the true matching between the datafiles, codified as follows: a number smaller or equal to n1=300 in entry j indicates the record in df1 to which record j in df2 truly matches, and a number n1+j indicates that record j in df2 does not match any record in df1.

Usage

data(twoFiles)

Arguments

References

Mauricio Sadinle (2017). Bayesian Estimation of Bipartite Matchings for Record Linkage. Journal of the American Statistical Association 112(518), 600-612. [Published] [arXiv]

Examples

Run this code
# NOT RUN {
data(twoFiles)

n1 <- nrow(df1)

## the true matches
cbind( df1[df2ID[df2ID<=n1],], df2[df2ID<=n1,] )

## alternatively
df1$ID <- 1:n1
df2$ID <- df2ID
merge(df1, df2, by="ID")

## all the records in a merged file
merge(df1, df2, by="ID", all=TRUE)

# }

Run the code above in your browser using DataLab