cluster_strings

s_vec

whether to space-squish and de-duplicate s_vec

clean

one of "osa","lv","dl" (as in `stringdist`)

method

max distance (typically damerau-levenshtein) between related strings.

max_dist

one of "cc" (connected components) or "eb" (edge betweeness)

algo

Returns an edit-distance based clusterization of an input vector of strings.
Each cluster will contain a set of strings w/ small mutual edit-distance
(e.g., Levenshtein, optimum-sequence-alignment, Damerau-Levenshtein), as computed by
stringdist::stringdist(). The set of all mutual edit-distances is then used by
graph algorithms (from package 'igraph') to single out subsets of high connectivity.

cluster_strings: Cluster Strings by Edit-Distance

Description

Usage

Arguments

Value

Examples