stopwords (version 2.3)

stopwords: Collection of stopwords in multiple languages

Description

This function returns character vectors of stopwords for different languages, using the ISO-639-1 language codes, and allows for different sources of stopwords to be defined.

The default source is the Snowball() stopwords collection but other() sources are also available.

Usage

stopwords(language = "en", source = "snowball", simplify = TRUE)

Arguments

language

specify language of stopwords by ISO 639-1 code

source

specify a stopwords source. To list the currently available options, use stopwords_getsources().

simplify

logical; if TRUE return a simple vector, if FALSE return a list if the original word list was nested

Value

a character vector containing the stopwords, or a list of characters simplify = FALSE

Details

The language codes for each stopword list use the two-letter ISO code from https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes. For backwards compatibility, the full English names of the stopwords from the quanteda package may also be used, although these are deprecated.

Examples

Run this code
# NOT RUN {
stopwords("en")
stopwords("de")
# }

Run the code above in your browser using DataCamp Workspace