text_sub

integer vector giving the starting positions of the
 subsequences, or a two-column integer matrix giving the starting
 and ending positions.

start

integer vector giving the ending positions of the
 subsequences; ignored if <code>start</code> is a two-column matrix.

filter specifying the transformation from text to
 token sequence.

filter

Extract token subsequences from a set of texts.

Text corpus data analysis, with full support for Unicode.  Functions for reading data from newline-delimited JSON files, for normalizing and tokenizing text, for searching for term occurrences, and for computing term occurrence frequencies (including n-grams).

text_sub: Text Subsequences

Description

Usage

Arguments

Value

Details

See Also

Examples