corpus (version 0.9.1)

stopwords: Stop Words

Description

Get a list of common function words (‘stop’ words).

Usage

stopwords(kind = "english")

Arguments

kind

a character vector giving the desired name or names of the stop word list(s), NA, or NULL. Allowed values are "danish", "dutch", "english", "finnish", "french", "german", "hungarian", "italian", "norwegian", "portuguese", and "russian"; these values retrieve the language-specific stop word lists.

Value

A character vector of unique stop words of the specified kind (or kinds if kind is a vector), or NULL if kind = NULL.

Details

stopwords returns a character vector of case-folded ‘stop’ words. These are common function words that often get discarded before performing other text analysis tasks.

The built-in word lists returned by this function are reasonable defaults, but they may require further tailoring to suit your particular task. The original lists were compiled by the Snowball stemming project. Following the Quanteda text analysis software, we have tailored the original lists by adding the word "will" to the English list.

See Also

text_filter

Examples

Run this code
# NOT RUN {
    head(stopwords("english"))
    head(stopwords("russian"))
    stopwords(NULL)

    # combine multiple lists, removing duplicates
    head(stopwords(c("spanish", "portuguese")))

    # add words to the default list:
    my_stopwords <- c(stopwords("english"), "mr.", "mrs.", "ms.")
# }

Run the code above in your browser using DataLab