RWeka (version 0.4-43)

Weka_tokenizers: R/Weka Tokenizers

Description

R interfaces to Weka tokenizers.

Usage

AlphabeticTokenizer(x, control = NULL)
NGramTokenizer(x, control = NULL)
WordTokenizer(x, control = NULL)

Arguments

x

a character vector with strings to be tokenized.

control

an object of class Weka_control, or a character vector of control options, or NULL (default). Available options can be obtained on-line using the Weka Option Wizard WOW, or the Weka documentation.

Value

A character vector with the tokenized strings.

Details

AlphabeticTokenizer is an alphabetic string tokenizer, where tokens are to be formed only from contiguous alphabetic sequences.

NGramTokenizer splits strings into \(n\)-grams with given minimal and maximal numbers of grams.

WordTokenizer is a simple word tokenizer.