get_token_stream

get_token_stream,numeric-method

get_token_stream,matrix-method

get_token_stream,character-method

get_token_stream,partition-method

get_token_stream,regions-method

An object of class <code>matrix</code> or <code>partition</code>

.Object

corpus

p_attribute

encoding

collapse

Logical, whether to adjust whitespace before and after interpunctation.

beautify

Logical, whether to return cpos as names of the tokens.

cpos

Maximum number of tokens to be reconstructed.

cutoff

Logical, whether to decode token ids to character strings.

decode

left

right

Turn regions of a corpus defined by corpus positions into the original text.

Library for corpus analysis using the Corpus Workbench as an
efficient back end for indexing and querying large corpora. The package offers
functionality to flexibly create partitions and to carry out basic statistical
operations (count, co-occurrences etc.). The original full text of documents
can be reconstructed and inspected at any time. Beyond that, the package is
intended to serve as an interface to packages implementing advanced statistical
procedures. Respective data structures (document term matrices, term co-
occurrence matrices etc.) can be created based on the indexed corpora.

get_token_stream: Get Token Stream Based on Corpus Positions.

Description

Usage

Arguments

Examples