wordStem

0th

Percentile

Get the stem of words

This function extracts the stems of each of the given words in the vector.

Usage
wordStem(words, language = "porter")
Arguments
words

a character vector of words whose stems are to be extracted.

language

the name of a recognized language, as returned by getStemLanguages, or a two- or three-letter ISO-639 code corresponding to one of these languages (see references for the list of codes).

Details

This uses Dr. Martin Porter's stemming algorithm and the C libstemmer library generated by Snowball.

Value

A character vector with as many elements as there are in the input vector with the corresponding elements being the stem of the word. Elements of the vector are converted to UTF-8 encoding before the stemming is performed, and the returned elements are marked as such when they contain non-ASCII characters.

References

http://snowball.tartarus.org/

http://www.loc.gov/standards/iso639-2/php/code_list.php for a list of ISO-639 language codes.

Aliases
  • wordStem
Examples
# NOT RUN {
  # Simple example
  wordStem(c("win", "winning", "winner"))

  # Test the supplied vocabulary
  for(lang in getStemLanguages()) {
      load(system.file("words", paste0(lang, ".RData"), package="SnowballC"))

      stopifnot(all(wordStem(voc[[1]], lang) == voc[[2]]))
  }

  stopifnot(is.na(wordStem(NA)))
# }
Documentation reproduced from package SnowballC, version 0.6.0, License: BSD_3_clause + file LICENSE

Community examples

Looks like there are no examples yet.