Learn R Programming

cleanepi (version 1.1.1)

correct_misspelled_values: Correct misspelled values by using approximate string matching techniques to compare them against the expected values.

Description

Correct misspelled values by using approximate string matching techniques to compare them against the expected values.

Usage

correct_misspelled_values(
  data,
  target_columns,
  wordlist,
  max_distance = 1,
  confirm = rlang::is_interactive(),
  ...
)

Value

The corrected input data according to the user-specified wordlist.

Arguments

data

The input <data.frame> or <linelist>

target_columns

A <vector> of the target column names. When the input data is a <linelist> object, this parameter can be set to linelist_tags to apply the fuzzy matching exclusively to the tagged columns.

wordlist

A <vector> of characters with the words to match to the detected misspelled values.

max_distance

An <integer> for the maximum distance allowed for detecting a spelling mistakes from the wordlist. The distance is the generalized Levenshtein edit distance (see adist()). Default is 1.

confirm

A <logical> that determines whether to show the user a menu of spelling corrections. If TRUE and using R interactively then the user will have the option to review the proposed spelling corrections. This argument is useful for turning off the menu() when rlang::is_interactive() returns TRUE but not wanting to prompt the user e.g. devtools::run_examples().

...

dots Extra arguments to pass to adist().

Details

When used interactively (see interactive()) the user is presented a menu to ensure that the words detected using approximate string matching are not false positives and the user can decided whether to proceed with the spelling corrections. In non-interactive sessions all misspelled values are replaced by their closest values within the provided vector of expected values.

If multiple words supplied in the wordlist equally match a word in the data and confirm is TRUE the user is presented a menu to choose the replacement word. If it is not used interactively multiple equal matches throws a warning.

Examples

Run this code
df <- data.frame(
  case_type = c("confirmed", "confermed", "probable", "susspected"),
  outcome = c("died", "recoverd", "did", "recovered")
)
df
correct_misspelled_values(
  data = df,
  target_columns = c("case_type", "outcome"),
  wordlist = c("confirmed", "probable", "suspected", "died", "recovered"),
  confirm = FALSE
)

Run the code above in your browser using DataLab