Learn R Programming

firmmatchr (version 0.1.2)

validate_matches_llm: Validate Matches using LLM (Azure OpenAI)

Description

Sends doubtful matches (not "Perfect" or "Unmatched") to an LLM for verification. Supports resuming from interruptions via chunk files.

Usage

validate_matches_llm(
  data,
  query_name_col,
  dict_name_col,
  output_dir = tempdir(),
  filename_stem = "match_validation",
  batch_size = 20,
  api_key = Sys.getenv("AZURE_API_KEY"),
  endpoint = Sys.getenv("AZURE_ENDPOINT"),
  deployment = Sys.getenv("AZURE_DEPLOYMENT")
)

Value

A data frame with added LLM_decision and LLM_reason columns.

Arguments

data

Data frame. Must contain the columns specified by query_name_col and dict_name_col.

query_name_col

String. Column containing the user's query name (Employer).

dict_name_col

String. Column containing the dictionary match name (Registry).

output_dir

String. Directory to save temporary chunks and final results. Defaults to tempdir().

filename_stem

String. Base name for output files.

batch_size

Integer. Number of rows to process before saving a chunk.

api_key

String. Azure API Key. Defaults to Sys.getenv("AZURE_API_KEY").

endpoint

String. Azure Endpoint. Defaults to Sys.getenv("AZURE_ENDPOINT").

deployment

String. Deployment name. Defaults to Sys.getenv("AZURE_DEPLOYMENT").

Examples

Run this code
if (FALSE) {
# Sample matched data
matched_data <- data.frame(
  employer_name = c("BMW", "Siemens"),
  registry_name = c("BMW AG", "SAP SE"),
  dict_id = c("D001", "D002"),
  match_type = c("Fuzzy", "Fuzzy")
)

# Validate using LLM (requires Azure credentials)
validated <- validate_matches_llm(
  data = matched_data,
  query_name_col = "employer_name",
  dict_name_col = "registry_name"
)

print(validated)
}

Run the code above in your browser using DataLab