Learn R Programming

easybio (version 1.2.2)

suggest_best_match: Suggest Best Matches for a String from a Vector of Choices

Description

This function provides intelligent suggestions for a user's input string by finding the best matches from a given vector of choices. It follows a multi-layered approach:

  1. Performs normalization (case-insensitivity, trimming whitespace).

  2. Checks for an exact match first for maximum performance and accuracy.

  3. If no exact match, it uses a combination of fuzzy string matching (Levenshtein distance via adist) to catch typos and partial/substring matching (grep) to handle incomplete input.

  4. Ranks the potential matches and returns the best suggestion(s).

Usage

suggest_best_match(
  x,
  choices,
  n = 1,
  threshold = 2,
  ignore.case = TRUE,
  return_distance = FALSE
)

Value

By default (return_distance = FALSE), returns a character vector of the best n suggestions. If no suitable match is found, returns NA. If return_distance = TRUE, returns a data.frame with columns suggestion and distance, or NULL if no match is found.

Arguments

x

A single character string; the user input to find matches for.

choices

A character vector of available, valid options.

n

An integer specifying the maximum number of suggestions to return. Defaults to 1.

threshold

An integer; the maximum Levenshtein distance to consider a choice a "close" match. A lower value is stricter. Defaults to 2.

ignore.case

A logical value. If TRUE, matching is case-insensitive. Defaults to TRUE.

return_distance

A logical value. If TRUE, the output is a data.frame containing the suggestions and their calculated distance/score. Defaults to FALSE.

Examples

Run this code
# --- Setup ---
cell_types <- c(
  "B cell", "T cell", "Macrophage", "Monocyte", "Neutrophil",
  "Natural Killer T-cell", "Dendritic cell"
)

# --- Usage ---
# 1. Exact match (after normalization)
suggest_best_match("t cell", cell_types)
#> [1] "T cell"

# 2. Typo correction (fuzzy match)
suggest_best_match("Macrophaeg", cell_types)
#> [1] "Macrophage"

# 3. Partial input (substring match)
suggest_best_match("Mono", cell_types)
#> [1] "Monocyte"

# 4. Requesting multiple suggestions
suggest_best_match("t", cell_types, n = 3)
#> [1] "T cell" "Neutrophil" "Natural Killer T-cell"

# 5. No good match found
suggest_best_match("Erythrocyte", cell_types)
#> [1] NA

# 6. Returning suggestions with their distance score
suggest_best_match("t ce", cell_types, n = 3, return_distance = TRUE)
#>              suggestion distance
#> 1                T cell        1
#> 2        Dendritic cell        2
#> 3 Natural Killer T-cell        2

Run the code above in your browser using DataLab