These functions take a character vector as input, identify and
cluster similar values, and then merge clusters together so their values
become identical. The functions are an implementation of the key collision
and ngram fingerprint algorithms from the open source tool Open Refine.