Learn R Programming

automatedRecLin (version 1.0.1)

comparison_vectors: Create Comparison Vectors for Record Linkage

Description

Creates comparison vectors between records in two datasets based on specified variables and comparison functions.

Usage

comparison_vectors(A, B, variables, comparators = NULL, matches = NULL)

Value

Returns a list containing:

  • Omega -- a data.table with comparison vectors between all records from both datasets, including optional match information,

  • variables -- a character vector of key variables used for comparison,

  • comparators -- a list of functions used to compare pairs of records,

  • match_prop -- proportion of matches in the smaller dataset.

Arguments

A

A duplicate-free data.frame or data.table.

B

A duplicate-free data.frame or data.table.

variables

A character vector of key variables used to create comparison vectors.

comparators

A named list of functions for comparing pairs of records.

matches

Optional. A data.frame or data.table indicating known matches.

Author

Adam Struzik

Details

Consider two datasets: \(A\) and \(B\). For each pair of records \((a,b) \in \Omega\), the function creates a comparison vector \(\pmb{\gamma}_{ab} = (\gamma_{ab}^1,\gamma_{ab}^2,\ldots,\gamma_{ab}^K)'\) based on specified \(K\) variables and comparison functions.

Examples

Run this code
df_1 <- data.frame(
"name" = c("John", "Emily", "Mark", "Anna", "David"),
"surname" = c("Smith", "Johnson", "Taylor", "Williams", "Brown")
)
df_2 <- data.frame(
  "name" = c("Jon", "Emely", "Marc", "Michael"),
  "surname" = c("Smitth", "Jonson", "Tailor", "Henderson")
)
comparators <- list("name" = jarowinkler_complement(),
                    "surname" = jarowinkler_complement())
matches <- data.frame("a" = 1:3, "b" = 1:3)
result <- comparison_vectors(A = df_1, B = df_2, variables = c("name", "surname"),
                             comparators = comparators, matches = matches)
result

Run the code above in your browser using DataLab