key_collision_merge

Character vector, items to be potentially clustered and merged.

vect

Character vector, these strings will be ignored during
the merging of values within <code>vect</code>. Default value is NULL.

ignore_strings

Logical, indicating whether the merging of records should
be insensitive to common business suffixes or not. Default value is TRUE.

bus_suffix

Character vector, meant to act as a dictionary during the
merging process. If any items within <code>vect</code> have a match in dict,
then those items will always be edited to be identical to their match in
dict. Default value is NULL.

dict

This function takes a character vector and makes edits and merges values
that are approximately equivalent yet not identical. It clusters values
based on the key collision method, described here
<a href="https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth">https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth</a>.

These functions take a character vector as input, identify and
cluster similar values, and then merge clusters together so their values
become identical. The functions are an implementation of the key collision
and ngram fingerprint algorithms from the open source tool Open Refine
<http://openrefine.org/>. More info on key collision and ngram fingerprint
can be found here <https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth>.

Chris Muir

refinr

Cluster and Merge Similar Values Within a Character Vector

key_collision_merge function

This function takes a character vector and makes edits and merges values
that are approximately equivalent yet not identical. It clusters values
based on the key collision method, described here
<a href='https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth'>https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth</a>.

key_collision_merge: Value merging based on Key Collision

Description

Usage

Arguments

Value

Examples