powered by
Complete optimized workflow for multiword detection and processing. Uses C++ functions and data.table for maximum performance.
process_multiwords_fast(x2, stats, term = c("lemma", "token"))
Data frame with columns: doc_id, term_id, multiword, upos_multiword, ngram
Data frame with token information
Data frame with multiword statistics (keyword, ngram columns)
Type of term to process: "lemma" or "token"
This function replaces the original switch block with an optimized version that uses:
C++ functions for text recoding
Vectorized operations instead of multiple mutate calls
Pre-computed lookups to avoid repeated joins
if (FALSE) { result <- process_multiwords_fast(dfTag, multiword_stats, term = "lemma") }
Run the code above in your browser using DataLab