Learn R Programming

statar (version 0.3.0)

compute_distance: Find minimum distance of each word to other groups

Description

Find minimum distance of each word to other groups

Usage

compute_distance(id, name, n = 1, method = "jw", p = 0.1, ...)

Arguments

id
a vector of identifiers
name
a vector of characters
n
number of words for combinations. Default to 0.
method
See the stringdist documentation. Default to "jw"
p
See the stringdist documentation. Default to 0.1
...
Other arguments to pass to stringdist. See the stringdist documentation.

Value

  • tab_accross returns a data.frame of four columns. The first is id, the second corresponds to unique combination of words in each element of v with length lower than n (sorted alphabetically), the third is the count of these permutation within id, the fourth is the count of these permutation accross i. When the count accross group is 1 and the count within group is high, the element can be considered as an identifier of the group. library(stringdist) id <- c(1, 1, 2, 2) name <- c("coca cola company", "coca cola incorporated", "apple incorporated", "apple corp") compute_distance(id, name, n = 0) compute_distance(id, name, n = 1) compute_distance(id, name, n = 2)