These functions each turn a text into tokens. The <code>tokenize_ngrams</code>
functions returns shingled n-grams.

tokenizers

Tools for measuring similarity among documents and detecting
    passages which have been reused. Implements shingled n-gram, skip n-gram,
    and other tokenizers; similarity/dissimilarity functions; pairwise
    comparisons; minhash and locality sensitive hashing algorithms; and a
    version of the Smith-Waterman local alignment algorithm suitable for
    natural language.

tokenizers: Split texts into tokens

Description

Usage

Arguments

Value

Details

Examples