subset.tCorpus

Provides text analysis in R, focusing on the use of a tokenized text format. In this format, the positions of tokens are maintained, and each token can be annotated (e.g., part-of-speech tags, dependency relations).
Prominent features include advanced Lucene-like querying for specific tokens or contexts (e.g., documents, sentences),
similarity statistics for words and documents, exporting to DTM for compatibility with many text analysis packages,
and the possibility to reconstruct original text from tokens to facilitate interpretation.

Kasper Welbers

corpustools

Managing, Querying and Analyzing Tokenized Text

subset.tCorpus function

<dl><dt>x</dt>
<dd>a tCorpus object</dd>
<dt>subset</dt>
<dd>logical expression indicating rows to keep in the tokens data.</dd>
<dt>subset_meta</dt>
<dd>logical expression indicating rows to keep in the document meta data.</dd>
<dt>window</dt>
<dd>If not NULL, an integer specifiying the window to be used to return the subset. For instance, if the subset contains token 10 in a document and window is 5, the subset will contain token 5 to 15. Naturally, this does not apply to subset_meta.</dd>
<dt>...</dt>
<dd>not used</dd></dl>

Arguments

S3 subset for tCorpus class — subset.tCorpus

<dl>

<dt>x</dt>
<dd>a tCorpus object</dd>


<dt>subset</dt>
<dd>logical expression indicating rows to keep in the tokens data.</dd>


<dt>subset_meta</dt>
<dd>logical expression indicating rows to keep in the document meta data.</dd>


<dt>window</dt>
<dd>If not NULL, an integer specifiying the window to be used to return the subset. For instance, if the subset contains token 10 in a document and window is 5, the subset will contain token 5 to 15. Naturally, this does not apply to subset_meta.</dd>


<dt>...</dt>
<dd>not used</dd>

</dl>

subset.tCorpus: S3 subset for tCorpus class

Description

Usage

Arguments

Examples