itoken: Creates iterator over input object.

Description

Creates iterator over input object. This iterator usually used in following functions : vocabulary, create_vocab_corpus, create_hash_corpus. See them for details.

Usage

itoken(iterable, ...)
## S3 method for class 'character':
itoken(iterable, preprocess_function, tokenizer,
  chunks_number = 10, progessbar = interactive(), ...)
## S3 method for class 'ifiles':
itoken(iterable, preprocess_function, tokenizer,
  progessbar = interactive(), ...)
## S3 method for class 'iserfiles':
itoken(iterable, progessbar = interactive(), ...)
## S3 method for class 'ilines':
itoken(iterable, preprocess_function, tokenizer, ...)

Arguments

iterable

an object from which to generate an iterator.

...

arguments passed to other methods (not used at the moment).

preprocess_function

function which takes chunk of objects - character vector and do all preprocessing (including stemming if needed). Usually preprocess_function should return character vector - vector of preprocesse

tokenizer

function which takes character vector from preprocess_function, split it into tokens and returns list of character vectors. Also you can perform tokenization in preprocess_function (actually

chunks_number

integer, the number of pieces that object should be divided into.

progessbar

logical indicates whether to show progress bar.

Examples

Run this code

data("movie_review")
txt <- movie_review[['review']][1:100]
it <- itoken(txt, tolower, word_tokenizer, chunks_number = 7)

Run the code above in your browser using DataLab

Description

Usage

Arguments

See Also

Examples