txt_recode_ngram_fast

Efficiently combines consecutive tokens into multiword expressions using C++.
This function scans text sequentially to identify and merge n-gram patterns.

An R 'shiny' app designed for diverse text analysis tasks, offering a wide range of methodologies tailored to Natural Language Processing (NLP) needs.
It is a versatile, general-purpose tool for analyzing textual data.
'tall' features a comprehensive workflow, including data cleaning, preprocessing, statistical analysis, and visualization, all integrated for effective text analysis.

Massimo Aria

tall

Text Analysis for All

Maria Spano

Luca D'Aniello

Corrado Cuccurullo

Michelangelo Misuraca

txt_recode_ngram_fast function

<dl><dt>x</dt>
<dd>Character vector of tokens (e.g., lemmas or tokens)</dd>
<dt>compound</dt>
<dd>Character vector of multiword expressions to match</dd>
<dt>ngram</dt>
<dd>Integer vector indicating the length of each compound</dd>
<dt>sep</dt>
<dd>String separator to use when joining tokens (default: " ")</dd></dl>

Arguments

Fast n-gram recoding for multiword detection — txt_recode_ngram_fast

<dl>

<dt>x</dt>
<dd>Character vector of tokens (e.g., lemmas or tokens)</dd>


<dt>compound</dt>
<dd>Character vector of multiword expressions to match</dd>


<dt>ngram</dt>
<dd>Integer vector indicating the length of each compound</dd>


<dt>sep</dt>
<dd>String separator to use when joining tokens (default: " ")</dd>

</dl>

txt_recode_ngram_fast: Fast n-gram recoding for multiword detection

Description

Usage

Value

Arguments

Details

Examples