Learn R Programming

quanteda (version 0.7.2-1)

dictionary-class: create a dictionary

Description

Create a quanteda dictionary, either from a list or by importing from a foreign format. Currently supported formats are the Wordstat and LIWC formats.

Usage

dictionary(x = NULL, file = NULL, format = NULL, enc = "",
  tolower = TRUE, maxcats = 10)

Arguments

x
a list of character vector dictionary entries, including regular expressions (see examples)
file
file identifier for a foreign dictionary
format
character identifier for the format of the foreign dictionary. Available options are: [object Object],[object Object]
enc
optional encoding value for dictionaries imported in Wordstat format
tolower
if TRUE, convert all dictionary functions to lower
maxcats
optional maximum categories to which a word could belong in a LIWC dictionary file, defaults to 10 (which is more than the actual LIWC 2007 dictionary uses). The default value of 10 is likely to be more than enough.

Value

  • A list with a ductionary class label, to be used by other functions in quanteda.

References

Wordstat dictionaries page, from Provalis Research http://provalisresearch.com/products/content-analysis-software/wordstat-dictionary/.

Pennebaker, J.W., Chung, C.K., Ireland, M., Gonzales, A., & Booth, R.J. (2007). The development and psychometric properties of LIWC2007. [Software manual]. Austin, TX (www.liwc.net).

See Also

dfm

Examples

Run this code
mycorpus <- subset(inaugCorpus, Year>1900)
mydict <-
    dictionary(list(christmas=c("Christmas", "Santa", "holiday"),
                    opposition=c("Opposition", "reject", "notincorpus"),
                    taxing="taxing",
                    taxation="taxation",
                    taxregex="tax*",
                    country="united states"))
dfm(mycorpus, dictionary=mydict)
# import the Laver-Garry dictionary from http://bit.ly/1FH2nvf
lgdict <- dictionary(file="~/Dropbox/QUANTESS/dictionaries/Misc Provalis/LaverGarry.cat",
                     format="wordstat")
dfm(inaugTexts, dictionary=lgdict)

# import a LIWC formatted dictionary
liwcdict <- dictionary(file = "~/Dropbox/QUANTESS/dictionaries/LIWC/LIWC2001_English.dic",
                       format = "LIWC")
dfm(inaugTexts, dictionary=liwcdict)

Run the code above in your browser using DataLab