Learn R Programming

⚠️There's a newer version (0.1.2) of this package.Take me there.

reclin (version 0.1.1)

Record Linkage Toolkit

Description

Functions to assist in performing probabilistic record linkage and deduplication: generating pairs, comparing records, em-algorithm for estimating m- and u-probabilities, forcing one-to-one matching. Can also be used for pre- and post-processing for machine learning methods for record linkage.

Copy Link

Version

Install

install.packages('reclin')

Monthly Downloads

57

Version

0.1.1

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Jan der Laan

Last Published

August 9th, 2018

Functions in reclin (0.1.1)

filter_pairs_for_deduplication

Remove pairs which do not have to be compared for deduplication
town_names

Spelling variations of a set of town names
score_simsum

Score pairs by summing the similarity vectors
link

Use the selected pairs to generate a linked data set
linkexample1

Tiny example dataset for probabilistic linkage
predict.problink_em

Calculate weights and probabilities for pairs
summary.problink_em

tabulate_patterns

Create a table of comparison patterns
add_from_x

Add variables from data sets to pairs
problink_em

Calculate EM-estimates of m- and u-probabilities
identical

Comparison functions
deduplicate_equivalence

Deduplicatin using equivalence groups
compare_pairs

Compare all pairs of records
select_greedy

Select matching pairs enforcing one-to-one linkage
select_threshold

Select pairs for linkage using a threshold
greedy

Greedy one-to-one matching of pairs
score_problink

Score comparison patterns of pairs using the probabilistic linkage framework
pair_blocking

Generate pairs using simple blocking
match_n_to_m

Force n to m matching on a set of pairs