itoken(iterable, ...)## S3 method for class 'character':
itoken(iterable, preprocess_function, tokenizer,
chunks_number = 10, progessbar = interactive(), ...)
## S3 method for class 'ifiles':
itoken(iterable, preprocess_function, tokenizer,
progessbar = interactive(), ...)
## S3 method for class 'iserfiles':
itoken(iterable, progessbar = interactive(), ...)
## S3 method for class 'ilines':
itoken(iterable, preprocess_function, tokenizer, ...)
function
which takes chunk of objects -
character vector
and do all preprocessing (including stemming if needed).
Usually preprocess_function
should return character vector
- vector of
preprocessefunction
which takes character vector
from preprocess_function, split it into tokens and returns
list
of character vector
s.
Also you can perform tokenization in preprocess_function
(actuallyinteger
, the number of pieces that object should be divided into.logical
indicates whether to show progress bar.data("movie_review")
txt <- movie_review[['review']][1:100]
it <- itoken(txt, tolower, word_tokenizer, chunks_number = 7)
Run the code above in your browser using DataLab