Learn R Programming

statar (version 0.2)

count_combinations: Find best string combinations that identify an id

Description

Find best string combinations that identify an id

Usage

count_combinations(name, id, n = 1)

Arguments

name
a vector of characters
id
a vector of identifiers
n
number of words for combinations. Default to 1.

Value

  • tab_accross returns a data.frame of four columns. The first is id, the second corresponds to unique combination of words in each element of v with length lower than n (sorted alphabetically), the third is the count of these permutation within id, the fourth is the count of these permutation accross i. Intuitively, when the count accross group is 1 and the count within group is high, the element can be considered as an identifier of the group.

Examples

Run this code
id <- c(1, 1, 2, 2)
name <- c("coca cola company", "coca cola incorporated", "apple incorporated", "apple corp")
count_combinations(name, id = id)
count_combinations(name, id = id, n = 2)

Run the code above in your browser using DataLab