count_combinations: Find best string combinations that identify an id
Description
Find best string combinations that identify an id
Usage
count_combinations(name, id, n = 1)
Arguments
name
a vector of characters
id
a vector of identifiers
n
number of words for combinations. Default to 1.
Value
tab_accross returns a data.frame of four columns. The first is id, the second corresponds to unique combination of words in each element of v with length lower than n (sorted alphabetically), the third is the count of these permutation within id, the fourth is the count of these permutation accross i. Intuitively, when the count accross group is 1 and the count within group is high, the element can be considered as an identifier of the group.
id <- c(1, 1, 2, 2)
name <- c("coca cola company", "coca cola incorporated", "apple incorporated", "apple corp")
count_combinations(name, id = id)
count_combinations(name, id = id, n = 2)