Learn R Programming

text2vec (version 0.2.0)

itoken: Creates iterator over input object.

Description

Creates iterator over input object. This iterator usually used in following functions : vocabulary, create_vocab_corpus, create_hash_corpus. See them for details.

Usage

itoken(iterable, ...)

## S3 method for class 'character': itoken(iterable, preprocess_function, tokenizer, chunks_number = 10, progessbar = interactive(), ...)

## S3 method for class 'ifiles': itoken(iterable, preprocess_function, tokenizer, progessbar = interactive(), ...)

## S3 method for class 'iserfiles': itoken(iterable, progessbar = interactive(), ...)

## S3 method for class 'ilines': itoken(iterable, preprocess_function, tokenizer, ...)

Arguments

iterable
an object from which to generate an iterator.
...
arguments passed to other methods (not used at the moment).
preprocess_function
function which takes chunk of objects - character vector and do all preprocessing (including stemming if needed). Usually preprocess_function should return character vector - vector of preprocesse
tokenizer
function which takes character vector from preprocess_function, split it into tokens and returns list of character vectors. Also you can perform tokenization in preprocess_function (actually
chunks_number
integer, the number of pieces that object should be divided into.
progessbar
logical indicates whether to show progress bar.

See Also

vocabulary, create_vocab_corpus, create_hash_corpus

Examples

Run this code
data("movie_review")
txt <- movie_review[['review']][1:100]
it <- itoken(txt, tolower, word_tokenizer, chunks_number = 7)

Run the code above in your browser using DataLab