Learn R Programming

quanteda (version 0.9.2-0)

dictionary: create a dictionary

Description

Create a quanteda dictionary, either from a list or by importing from a foreign format. Currently supported formats are the Wordstat and LIWC formats.

Usage

dictionary(x = NULL, file = NULL, format = NULL, enc = "",
  toLower = TRUE)

Arguments

x
a list of character vector dictionary entries, including regular expressions (see examples)
file
file identifier for a foreign dictionary
format
character identifier for the format of the foreign dictionary. Available options are: [object Object],[object Object]
enc
optional encoding value for reading in imported dictionaries. This uses the iconv labels for encoding. See the "Encoding" section of the help for file.
toLower
if TRUE, convert all dictionary keys and values to lower

Value

  • A dictionary class object, essentially a specially classed named list of characters.

References

Wordstat dictionaries page, from Provalis Research http://provalisresearch.com/products/content-analysis-software/wordstat-dictionary/.

Pennebaker, J.W., Chung, C.K., Ireland, M., Gonzales, A., & Booth, R.J. (2007). The development and psychometric properties of LIWC2007. [Software manual]. Austin, TX (www.liwc.net).

See Also

dfm

Examples

Run this code
mycorpus <- subset(inaugCorpus, Year>1900)
mydict <- 
    dictionary(list(christmas=c("Christmas", "Santa", "holiday"),
                    opposition=c("Opposition", "reject", "notincorpus"),
                    taxing="taxing",
                    taxation="taxation",
                    taxregex="tax*",
                    country="united states"))
dfm(mycorpus, dictionary=mydict)                     
# import the Laver-Garry dictionary from http://bit.ly/1FH2nvf
lgdict <- dictionary(file="http://www.kenbenoit.net/courses/essex2014qta/LaverGarry.cat",
                     format="wordstat")
dfm(inaugTexts, dictionary=lgdict)

# import a LIWC formatted dictionary from http://www.moralfoundations.org
mfdict <- dictionary(file = "http://ow.ly/VMRkL", format = "LIWC")
dfm(inaugTexts, dictionary = mfdict)

Run the code above in your browser using DataLab