RWeka (version 0.4-46)

Weka_tokenizers: R/Weka Tokenizers

Description

R interfaces to Weka tokenizers.

Usage

AlphabeticTokenizer(x, control = NULL)
NGramTokenizer(x, control = NULL)
WordTokenizer(x, control = NULL)

Value

A character vector with the tokenized strings.

Arguments

x

a character vector with strings to be tokenized.

control

an object of class Weka_control, or a character vector of control options, or NULL (default). Available options can be obtained on-line using the Weka Option Wizard WOW, or the Weka documentation.

Details

AlphabeticTokenizer is an alphabetic string tokenizer, where tokens are to be formed only from contiguous alphabetic sequences.

NGramTokenizer splits strings into \(n\)-grams with given minimal and maximal numbers of grams.

WordTokenizer is a simple word tokenizer.