matching_threatenedperu: Match Species Names to Threatened Plant List of Peru

Description

This function matches given species names against the internal database of threatened plant species in Peru. It uses a hierarchical matching strategy that includes direct matching, genus-level matching, fuzzy matching, and suffix matching to maximize successful matches while maintaining accuracy.

Usage

matching_threatenedperu(
  splist,
  source = c("original", "updated"),
  quiet = TRUE
)

Value

A tibble with detailed matching results including:

sorter: Integer. Original position in input vector
Orig.Name: Character. Original input name (standardized)
Matched.Name: Character. Matched name from database or "---"
Threat.Status: Character. IUCN threat category or "Not threatened"
Rank: Integer. Input taxonomic rank (1-4)
Matched.Rank: Integer. Matched taxonomic rank
Comp.Rank: Logical. Whether ranks match exactly
Match.Level: Character. Description of match quality
matched: Logical. Whether a match was found

Arguments

splist

A character vector containing the species names to be matched. Can include duplicate names - results will be expanded to match the input.

source

Character string specifying which database version to use. Options are:

"original" (default): Uses the original threatened species database with support for Rank 4 (quaternomial names)
"updated": Uses the updated database with current nomenclature, supporting up to Rank 3 (trinomial names)

quiet

Logical, default TRUE. If FALSE, prints informative messages.

Details

**Duplicate Handling:** When the input contains duplicate names, the function automatically:

Detects duplicates and creates a tracking column (sorters)
Processes only unique names (efficient matching)
Expands results to restore all original positions
Preserves original input order via sorter column

The duplicate handling uses a `sorters` column that concatenates all original sorter values for duplicate names (e.g., "1 - 3" for a name appearing at positions 1 and 3), enabling accurate result expansion.

**Matching Strategy:** 1. Direct exact matching 2. Genus-level matching (exact and fuzzy) 3. Species-level matching within genus 4. Infraspecies-level matching (up to 2 levels for original database)

**Rank Validation:** The algorithm implements strict rank validation to prevent false positives.

Examples

Run this code

if (FALSE) {
# Basic usage
species_list <- c("Cattleya maxima", "Polylepis incana")
results <- matching_threatenedperu(species_list, source = "original")

# With duplicates
species_dup <- c("Cattleya maxima", "Polylepis incana", "Cattleya maxima")
results_dup <- matching_threatenedperu(species_dup)
nrow(results_dup) == 3  # TRUE - preserves duplicates

# Access metadata
attr(results, "match_rate")

# Check for ambiguous matches
get_ambiguous_matches(results, type = "infraspecies")
}