stopwords

0th

Percentile

Stopwords

Return various kinds of stopwords with support for different languages.

Keywords
file
Usage
stopwords(kind = "en")
Arguments
kind
A character string identifying the desired stopword list.
Details

Available stopword lists are:

catalan
Catalan stopwords (obtained from http://latel.upf.edu/morgana/altres/pub/ca_stop.htm),

romanian
Romanian stopwords (extracted from http://snowball.tartarus.org/otherapps/romanian/romanian1.tgz),

SMART
English stopwords from the SMART information retrieval system (obtained from http://jmlr.csail.mit.edu/papers/volume5/lewis04a/a11-smart-stop-list/english.stop) (which coincides with the stopword list used by the MC toolkit (http://www.cs.utexas.edu/users/dml/software/mc/)),

and a set of stopword lists from the Snowball stemmer project in different languages (obtained from http://svn.tartarus.org/snowball/trunk/website/algorithms/*/stop.txt). Supported languages are danish, dutch, english, finnish, french, german, hungarian, italian, norwegian, portuguese, russian, spanish, and swedish. Language names are case sensitive. Alternatively, their IETF language tags may be used.

Value

is raised if no stopwords are available for the requested kind.

Aliases
  • stopwords
Examples
stopwords("en")
stopwords("SMART")
stopwords("german")
Documentation reproduced from package tm, version 0.6-2, License: GPL-3

Community examples

Looks like there are no examples yet.