Learn R Programming

tall (version 0.5.2)

process_multiwords_fast: Optimized multiword processing workflow

Description

Complete optimized workflow for multiword detection and processing. Uses C++ functions and data.table for maximum performance.

Usage

process_multiwords_fast(x2, stats, term = c("lemma", "token"))

Value

Data frame with columns: doc_id, term_id, multiword, upos_multiword, ngram

Arguments

x2

Data frame with token information

stats

Data frame with multiword statistics (keyword, ngram columns)

term

Type of term to process: "lemma" or "token"

Details

This function replaces the original switch block with an optimized version that uses:

  • C++ functions for text recoding

  • Vectorized operations instead of multiple mutate calls

  • Pre-computed lookups to avoid repeated joins

Examples

Run this code
if (FALSE) {
result <- process_multiwords_fast(dfTag, multiword_stats, term = "lemma")
}

Run the code above in your browser using DataLab