openNLP (version 0.0-7)

tokenize: Tokenizer

Description

Tokenizes the input.

Usage

tokenize(s, language = "en", model = NULL)

Arguments

s
A character vector of texts to be tokenized.
language
A character string giving the language of s. This argument is only used if model is NULL for selecting a default model. At the moment, languages en (English), es (Spanish), <
model
A model.

Value

  • A character vector holding the tokenized s.

Details

If model is NULL then a default model for sentence detection is loaded from the corresponding openNLP models language package.

References

OpenNLP http://opennlp.sourceforge.net/

Examples

Run this code
s <- "This is a sentence."
tokenize(s, language = "en")
s <- "¿Como se llama usted? El castellano es la lengua española oficial
del Estado."
tokenize(s, language = "es")

Run the code above in your browser using DataLab