Learn R Programming

textreuse (version 0.1.2)

tokenizers: Split texts into tokens

Description

These functions each turn a text into tokens. The tokenize_ngrams functions returns shingled n-grams.

Usage

tokenize_words(string, lowercase = TRUE)

tokenize_sentences(string, lowercase = TRUE)

tokenize_ngrams(string, lowercase = TRUE, n = 3)

tokenize_skip_ngrams(string, lowercase = TRUE, n = 3, k = 1)

Arguments

Value

  • A character vector containing the tokens.

Details

These functions will strip all punctuation.

Examples

Run this code
dylan <- "How many roads must a man walk down? The answer is blowin' in the wind."
tokenize_words(dylan)
tokenize_sentences(dylan)
tokenize_ngrams(dylan, n = 2)
tokenize_skip_ngrams(dylan, n = 3, k = 2)

Run the code above in your browser using DataLab