Learn R Programming

polmineR (version 0.7.11)

get_token_stream: Get Token Stream Based on Corpus Positions.

Description

Turn regions of a corpus defined by corpus positions into the original text.

Usage

get_token_stream(.Object, ...)

# S4 method for numeric get_token_stream(.Object, corpus, p_attribute, encoding = NULL, collapse = NULL, beautify = TRUE, cpos = FALSE, cutoff = NULL, decode = TRUE, ...)

# S4 method for matrix get_token_stream(.Object, ...)

# S4 method for character get_token_stream(.Object, left = NULL, right = NULL, ...)

# S4 method for partition get_token_stream(.Object, p_attribute, collapse = NULL, cpos = FALSE, ...)

# S4 method for regions get_token_stream(.Object, p_attribute = "word", ...)

Arguments

.Object

An object of class matrix or partition

...

Further arguments.

corpus

The CWB corpus.

p_attribute

The p-attribute to decode.

encoding

Encoding to use.

collapse

Length-one character string.

beautify

Logical, whether to adjust whitespace before and after interpunctation.

cpos

Logical, whether to return cpos as names of the tokens.

cutoff

Maximum number of tokens to be reconstructed.

decode

Logical, whether to decode token ids to character strings.

left

Left corpus position.

right

Right corpus position.

Examples

Run this code
# NOT RUN {
get_token_stream(0:9, corpus = "GERMAPARLMINI", p_attribute = "word")
get_token_stream(0:9, corpus = "GERMAPARLMINI", p_attribute = "word", collapse = " ")
fulltext <- get_token_stream("GERMAPARLMINI", p_attribute = "word")
# }

Run the code above in your browser using DataLab