twoFiles

0th

Percentile

Two Datasets for Record Linkage

Two data frames, df1 and df2, containing 300 and 150 records of artificially created individuals, where 50 of them are included in both datafiles. In addition, the vector df2ID contains one entry per record in df2 indicating the true matching between the datafiles, codified as follows: a number smaller or equal to n1=300 in entry j indicates the record in df1 to which record j in df2 truly matches, and a number n1+j indicates that record j in df2 does not match any record in df1.

Usage
data(twoFiles)
References

Mauricio Sadinle (2017). Bayesian Estimation of Bipartite Matchings for Record Linkage. Journal of the American Statistical Association 112(518), 600-612. [Published] [arXiv]

Aliases
  • twoFiles
  • df1
  • df2
  • df2ID
Examples
# NOT RUN {
data(twoFiles)

n1 <- nrow(df1)

## the true matches
cbind( df1[df2ID[df2ID<=n1],], df2[df2ID<=n1,] )

## alternatively
df1$ID <- 1:n1
df2$ID <- df2ID
merge(df1, df2, by="ID")

## all the records in a merged file
merge(df1, df2, by="ID", all=TRUE)

# }
Documentation reproduced from package BRL, version 0.1.0, License: GPL-3

Community examples

Looks like there are no examples yet.