Learn R Programming

reclin (version 0.1.2)

Record Linkage Toolkit

Description

Functions to assist in performing probabilistic record linkage and deduplication: generating pairs, comparing records, em-algorithm for estimating m- and u-probabilities, forcing one-to-one matching. Can also be used for pre- and post-processing for machine learning methods for record linkage.

Copy Link

Version

Install

install.packages('reclin')

Monthly Downloads

55

Version

0.1.2

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Jan der Laan

Last Published

November 23rd, 2021

Functions in reclin (0.1.2)

match_n_to_m

Force n to m matching on a set of pairs
filter_pairs_for_deduplication

Remove pairs which do not have to be compared for deduplication
link

Use the selected pairs to generate a linked data set
identical

Comparison functions
compare_pairs

Compare all pairs of records
add_from_x

Add variables from data sets to pairs
greedy

Greedy one-to-one matching of pairs
pair_blocking

Generate pairs using simple blocking
linkexample1

Tiny example dataset for probabilistic linkage
deduplicate_equivalence

Deduplicatin using equivalence groups
select_greedy

Select matching pairs enforcing one-to-one linkage
problink_em

Calculate EM-estimates of m- and u-probabilities
predict.problink_em

Calculate weights and probabilities for pairs
tabulate_patterns

Create a table of comparison patterns
summary.problink_em

score_problink

Score comparison patterns of pairs using the probabilistic linkage framework
score_simsum

Score pairs by summing the similarity vectors
select_threshold

Select pairs for linkage using a threshold
town_names

Spelling variations of a set of town names