Learn R Programming

refinr (version 0.3.1)

key_collision_merge: Value merging based on Key Collision

Description

This function takes a character vector and makes edits and merges values that are approximately equivalent yet not identical. It clusters values based on the key collision method, described here https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth.

Usage

key_collision_merge(vect, ignore_strings = NULL, bus_suffix = TRUE,
  dict = NULL)

Arguments

vect

Character vector, items to be potentially clustered and merged.

ignore_strings

Character vector, these strings will be ignored during the merging of values within vect. Default value is NULL.

bus_suffix

Logical, indicating whether the merging of records should be insensitive to common business suffixes or not. Default value is TRUE.

dict

Character vector, meant to act as a dictionary during the merging process. If any items within vect have a match in dict, then those items will always be edited to be identical to their match in dict. Default value is NULL.

Value

Character vector with similar values merged.

Examples

Run this code
# NOT RUN {
x <- c("Acme Pizza, Inc.", "ACME PIZZA COMPANY", "pizza, acme llc",
       "Acme Pizza, Inc.")
key_collision_merge(vect = x)

# Use parameter "dict" to influence how clustered values are edited.
key_collision_merge(vect = x, dict = c("Nicks Pizza", "acme PIZZA inc"))

# Use parameter 'ignore_strings' to ignore specific strings during merging
# of values.
x <- c("Bakersfield Highschool", "BAKERSFIELD high",
       "high school, bakersfield")
key_collision_merge(x, ignore_strings = c("high", "school", "highschool"))

# }

Run the code above in your browser using DataLab