Learn R Programming

bstrl

Bayesian STreaming Record Linkage

This package performs streaming record linkage. We assume that files containing records about entities arrive sequentially in time. Each file is duplicate-free, but entities may be represented in more than one file. We want to determine, probabilistically, which records refer to the same entities across each file. We also want these estimated links to be updated upon the arrival of each sequential file.

Branches

  • 'master' contains the latest version of streaming record linkage, using either Prior-Proposal-Recursive Bayes or Sequential Markov Chain Monte Carlo
  • All other branches are dead ends or have been merged into 'master'

Usage

To install, run

devtools::install_github("ianmtaylor1/bstrl", build_vignettes=TRUE)

then run vignette() to find included documentation and how-to's for this package.

Copy Link

Version

Install

install.packages('bstrl')

Monthly Downloads

197

Version

1.0.2

License

MIT + file LICENSE

Maintainer

Ian Taylor

Last Published

November 10th, 2022

Functions in bstrl (1.0.2)

precision

Calculate the precision of estimated links relative to true links
multifileRL

Perform multifile record linkage via Gibbs sampling "from scratch"
recall

Calculate the recall of estimated links relative to true links
thinsamples

Thin a bstrlstate object
alllinks

Return a list of all linked pairs (directly or transitively)
PPRBupdate

Perform a PPRB update of record linkage with a new file
fromentities

Create a streaming link object from known record entity id's
geco_small

Simulated Noisy Records (smaller set)
extractlinks

Extract the links from a bstrlstate object into a list of streaminglinks objects.
bipartiteRL

Perform baseline bipartite record linkage before streaming updates
SMCMCupdate

Perform an SMCMC update of record linkage with a new file
islinked

Return True/False whether the two record are coreferent
geco_30over_3err

Simulated Noisy Records
geco_small_result

Example linkage result object for small dataset