Learn R Programming

firmmatchr (version 0.1.3)

validate_matches_llm: Validate Matches using LLM (Azure OpenAI)

Description

Sends doubtful matches (not "Perfect" or "Unmatched") to an LLM for verification. Supports resuming from interruptions via chunk files.

Usage

validate_matches_llm(
  data,
  query_name_col,
  dict_name_col,
  output_dir = tempdir(),
  filename_stem = "match_validation",
  batch_size = 20,
  api_key = NULL,
  endpoint = NULL,
  deployment = NULL,
  engine = c("azure", "openai", "local")
)

Value

A data frame with added LLM_decision and LLM_reason columns.

Arguments

data

Data frame. Must contain the columns specified by query_name_col and dict_name_col.

query_name_col

String. Column containing the user's query name (Employer).

dict_name_col

String. Column containing the dictionary match name (Registry).

output_dir

String. Directory to save temporary chunks and final results. Defaults to tempdir().

filename_stem

String. Base name for output files.

batch_size

Integer. Number of rows to process before saving a chunk.

api_key

String. API Key. Defaults to Sys.getenv("AZURE_API_KEY") or Sys.getenv("OPENAI_API_KEY").

endpoint

String. API Endpoint. Defaults to Sys.getenv("AZURE_ENDPOINT") or Sys.getenv("OPENAI_ENDPOINT").

deployment

String. Deployment or model name. Defaults to Sys.getenv("AZURE_DEPLOYMENT") or Sys.getenv("OPENAI_MODEL").

engine

String. Either "azure", "openai", or "local". Defaults to "azure". Use "local" (or "openai") for local LLMs like Ollama.

Examples

Run this code
if (FALSE) {
# Sample matched data
matched_data <- data.frame(
  employer_name = c("BMW", "Siemens"),
  registry_name = c("BMW AG", "SAP SE"),
  dict_id = c("D001", "D002"),
  match_type = c("Fuzzy", "Fuzzy")
)

# Validate using LLM (requires Azure credentials)
validated <- validate_matches_llm(
  data = matched_data,
  query_name_col = "employer_name",
  dict_name_col = "registry_name"
)

print(validated)
}

Run the code above in your browser using DataLab