search_dict: Exact n-gram matcher (vector of terms)
Description
Find a long list of multi-word expressions (MWEs) or terms without regex
overhead or partial-match risks. Tokenize corpus, build n-grams, then exact
join against terms. Word boundaries are respected by design. For
categories (e.g. term = "R Project", category = "Software"), left_join your
metadata onto the result using ngram or term as key.
corpus <- data.frame(doc_id = "1", text = "Gen Z and Millennials use social media.")
search_dict(corpus, by = "doc_id", terms = c("Gen Z", "Millennials", "social media"))