regexp_tokenizer

tokenizers

word_tokenizer

string

<code>character</code> pattern symbol. Also can be one of <a rd-options="" href="/link/modifiers?package=text2vec&version=0.3.0" data-mini-rdoc="text2vec::modifiers">modifiers</a>.

pattern


simple wrappers around <code>stringi</code> and <code>stringr</code> packages functionality.


Very fast and memory-friendly tools for text vectorization and
state-of-the-art word embeddings (GloVe). This package provides a
source-agnostic streaming API, which allows researchers to perform analysis
of collections of documents which are much larger than available RAM. All
core functions are parallelized to benefit from multicore machines.

tokenizers: Tokenization functions, which performs string splitting

Description

Usage

Arguments

Value

Details

Examples