Weka_tokenizers

0th

Percentile

R/Weka Tokenizers

R interfaces to Weka tokenizers.

Keywords
character
Usage
AlphabeticTokenizer(x, control = NULL)
NGramTokenizer(x, control = NULL)
WordTokenizer(x, control = NULL)
Arguments
x
a character vector with strings to be tokenized.
control
an object of class Weka_control, or a character vector of control options, or NULL (default). Available options can be obtained on-line using the Weka Option Wizard
Details

AlphabeticTokenizer is an alphabetic string tokenizer, where tokens are to be formed only from contiguous alphabetic sequences.

NGramTokenizer splits strings into $n$-grams with given minimal and maximal numbers of grams.

WordTokenizer is a simple word tokenizer.

Value

  • A character vector with the tokenized strings.

Aliases
  • AlphabeticTokenizer
  • NGramTokenizer
  • WordTokenizer
Documentation reproduced from package RWeka, version 0.4-18, License: GPL-2

Community examples

Looks like there are no examples yet.