Learn R Programming

firmmatchr (version 0.1.3)

match_companies: Match Company Names against a Dictionary

Description

Runs a cascading matching pipeline: Exact -> Fuzzy (Zoomer) -> FTS5 -> Rarity. Matches found in earlier steps are removed from subsequent steps.

Usage

match_companies(
  queries,
  dictionary,
  query_col = "company_name",
  dict_col = "company_name",
  unique_id_col = "query_id",
  dict_id_col = "orbis_id",
  threshold_jw = 0.8,
  threshold_zoomer = 0.4,
  threshold_rarity = 1,
  n_cores = 1
)

Value

A data.table containing query_id, dict_id, and match_type.

Arguments

queries

Data frame. Must contain columns specified in query_col and unique_id_col.

dictionary

Data frame. Must contain columns specified in dict_col and dict_id_col.

query_col

String. Column name for company names in queries.

dict_col

String. Column name for company names in dictionary.

unique_id_col

String. ID column in queries.

dict_id_col

String. ID column in dictionary.

threshold_jw

Numeric (0-1). Minimum Jaro-Winkler similarity. Default 0.8.

threshold_zoomer

Numeric (0-1). Jaccard threshold for blocking. Default 0.4.

threshold_rarity

Numeric. Minimum score for rarity matching. Default 1.0.

n_cores

Integer. Number of cores (reserved for future parallel implementation).

Examples

Run this code
# Create sample query data
queries <- data.frame(
  query_id = 1:3,
  company_name = c("BMW", "Siemens AG", "Deutsche Bank")
)

# Create sample dictionary
dictionary <- data.frame(
  orbis_id = c("D001", "D002", "D003"),
  company_name = c("BMW AG", "Siemens Aktiengesellschaft", "Commerzbank AG")
)

# Match companies (uses multi-threaded Rust internals via zoomerjoin)
# \donttest{
results <- match_companies(
  queries = queries,
  dictionary = dictionary,
  query_col = "company_name",
  dict_col = "company_name",
  unique_id_col = "query_id",
  dict_id_col = "orbis_id"
)

print(results)
# }

Run the code above in your browser using DataLab