Weka_tokenizers

0th

Percentile

R/Weka Tokenizers

R interfaces to Weka tokenizers.

Keywords
character
Usage
AlphabeticTokenizer(x, control = NULL)
NGramTokenizer(x, control = NULL)
WordTokenizer(x, control = NULL)
Arguments
x

a character vector with strings to be tokenized.

control

an object of class Weka_control, or a character vector of control options, or NULL (default). Available options can be obtained on-line using the Weka Option Wizard WOW, or the Weka documentation.

Details

AlphabeticTokenizer is an alphabetic string tokenizer, where tokens are to be formed only from contiguous alphabetic sequences.

NGramTokenizer splits strings into \(n\)-grams with given minimal and maximal numbers of grams.

WordTokenizer is a simple word tokenizer.

Value

A character vector with the tokenized strings.

Aliases
  • AlphabeticTokenizer
  • NGramTokenizer
  • WordTokenizer
Documentation reproduced from package RWeka, version 0.4-39, License: GPL-2

Community examples

Looks like there are no examples yet.