Last chance! 50% off unlimited learning
Sale ends in
function tokenizing rtext objects
# S3 method for rtext
text_tokenize(string, regex = NULL,
ignore.case = FALSE, fixed = FALSE, perl = FALSE,
useBytes = FALSE, non_token = FALSE)
text to be tokenized
regex expressing where to cut see (see gregexpr)
whether or not reges should be case sensitive (see gregexpr)
whether or not regex should be interpreted as is or as regular expression (see gregexpr)
whether or not Perl compatible regex should be used (see gregexpr)
byte-by-byte matching of regex or character-by-character (see gregexpr)
should information for non-token, i.e. those patterns by which the text was splitted, be returned as well