powered by
Cluster Strings by Edit-Distance
cluster_strings(s_vec, clean = T, method = "osa", max_dist = 3, algo = "cc")
a vector of character strings
whether to space-squish and de-duplicate s_vec
one of "osa","lv","dl" (as in `stringdist`)
max distance (typically damerau-levenshtein) between related strings.
one of "cc" (connected components) or "eb" (edge betweeness)
a data frame containing cluster membership for each input string
# NOT RUN { s_vec <- c("alcool","alcohol","alcoholic","brandy","brandie","cacha<U+00E7>a") s_clust <- cluster_strings(s_vec,method="lv",max_dist=3,algo="cc") s_clust$df_clusters # }
Run the code above in your browser using DataLab