Learn R Programming

⚠️There's a newer version (0.3.0) of this package.Take me there.

tokenizers (version 0.1.0)

Tokenize Text

Description

Convert natural language text into tokens. The tokenizers have a consistent interface and are compatible with Unicode, thanks to being built on the 'stringi' package. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, lines, and regular expressions.

Copy Link

Version

Install

install.packages('tokenizers')

Monthly Downloads

35,749

Version

0.1.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Lincoln Mullen

Last Published

April 2nd, 2016

Functions in tokenizers (0.1.0)

basic-tokenizers

Basic tokenizers
tokenizers

Tokenizers
tokenize_word_stems

Word stem tokenizer
ngram-tokenizers

N-gram tokenizers