char_tokenizer

regexp_tokenizer

space_tokenizer

tokenizers

word_tokenizer

strings

other parameters to <a rd-options="" href="/link/strsplit?package=text2vec&version=0.4.0" data-mini-rdoc="text2vec::strsplit">strsplit</a> function, which is used under the hood.

<code>character</code> pattern symbol.

pattern

simple wrappers around <code>base</code> regular expressions.
For much more faster and functional tokenizers see <code>tokenizers</code> package:
<a href="https://cran.r-project.org/package=tokenizers">https://cran.r-project.org/package=tokenizers</a>.
Also see <code>str_split_*</code> functions in <code>stringi</code> and <code>stringr</code> packages.
The reason for not including this packages to <code>text2vec</code> dependencies is our
desare to keep number of dependencies as small as possible.

Fast and memory-friendly tools for text vectorization,
topic modeling (LDA, LSA), word embeddings (GloVe), similarities.
This package provides a source-agnostic streaming API, which allows researchers
to perform analysis of collections of documents which are larger than available RAM.
All core functions are parallelized to benefit from multicore machines.

Dmitriy Selivanov

text2vec

Modern Text Mining Framework for R

Lincoln Mullen

tokenizers function

other parameters to <a rd-options='' href='strsplit'>strsplit</a> function, which is used under the hood.

simple wrappers around <code>base</code> regular expressions.
For much more faster and functional tokenizers see <code>tokenizers</code> package:
<a href='https://cran.r-project.org/package=tokenizers'>https://cran.r-project.org/package=tokenizers</a>.
Also see <code>str_split_*</code> functions in <code>stringi</code> and <code>stringr</code> packages.
The reason for not including this packages to <code>text2vec</code> dependencies is our
desare to keep number of dependencies as small as possible.

tokenizers: Simple tokenization functions, which performs string splitting

Description

Usage

Arguments

Value

Examples