Learn R Programming

corpus (version 0.6.0)

stopwords: Stop Words

Description

Get a list of common function words (‘stop’ words).

Usage

stopwords(kind = "english")

Arguments

kind

the name of the stop word list, or NULL. Allowed values are "danish", "dutch", "english", "finnish", "french", "german", "hungarian", "italian", "norwegian", "portuguese", and "russian"; these values retrieve the language-specific stop word lists compiled by the Snowball stemming project.

Value

A character vector of stop words of the specified kind, or NULL if kind = NULL.

Details

stopwords returns a character vector of case-folded ‘stop’ words. These are common function words that often get discarded before performing other text analysis tasks.

The built-in word lists returned by this function are reasonable defaults, but they may require further tailoring to suit your particular task.

See Also

token_filter

Examples

Run this code
    head(stopwords("english"))
    head(stopwords("russian"))
    stopwords(NULL)

    # add words to the default list:
    my_stopwords <- c(stopwords("english"), "will", "mr", "mrs", "ms")

Run the code above in your browser using DataLab