Learn R Programming

statar (version 0.3.0)

count_combinations: Find best string combinations that identify an id

Description

Find best string combinations that identify an id

Usage

count_combinations(id, name, n = 1)

Arguments

id
a vector of identifiers
name
a vector of characters
n
number of words for combinations. Default to 1.

Value

  • tab_accross returns a data.frame of four columns. The first is id, the second corresponds to unique combination of words in each element of v with length lower than n (sorted alphabetically), the third is the count of these permutation within id, the fourth is the count of these permutation accross i. Intuitively, when the count accross group is 1 and the count within group is high, the element can be considered as an identifier of the group.

Examples

Run this code
library(stringdist)
id <- c(1, 1, 2, 2)
name <- c("coca cola company", "coca cola incorporated", "apple incorporated", "apple corp")
count_combinations(id, name)
count_combinations(id, name, n = 2)

Run the code above in your browser using DataLab