Learn R Programming

quanteda (version 0.9.6-9)

applyDictionary: apply a dictionary or thesarus to an object

Description

Convert features into equivalence classes defined by values of a dictionary object.

Usage

applyDictionary(x, dictionary, ...)

## S3 method for class 'dfm': applyDictionary(x, dictionary, exclusive = TRUE, valuetype = c("glob", "regex", "fixed"), case_insensitive = TRUE, capkeys = !exclusive, verbose = TRUE, ...)

Arguments

x
object to which dictionary or thesaurus will be supplied
dictionary
the dictionary-class object that will be applied to x
...
not used
exclusive
if TRUE, remove all features not in dictionary, otherwise, replace values in dictionary with keys while leaving other features unaffected
valuetype
how to interpret dictionary values: "glob" for "glob"-style wildcard expressions (the format used in Wordstat and LIWC formatted dictionary values); "regex" for regular expressions; or "fixed" for exact matching (en
case_insensitive
ignore the case of dictionary values if TRUE
capkeys
if TRUE, convert dictionary keys to uppercase to distinguish them from other features
verbose
print status messages if TRUE

Value

  • an object of the type passed with the value-matching features replaced by dictionary keys

Examples

Run this code
myDict <- dictionary(list(christmas = c("Christmas", "Santa", "holiday"),
                          opposition = c("Opposition", "reject", "notincorpus"),
                          taxglob = "tax*",
                          taxregex = "tax.+$",
                          country = c("United_States", "Sweden")))
myDfm <- dfm(c("My Christmas was ruined by your opposition tax plan.", 
               "Does the United_States or Sweden have more progressive taxation?"),
             ignoredFeatures = stopwords("english"), verbose = FALSE)
myDfm

# glob format
applyDictionary(myDfm, myDict, valuetype = "glob")
applyDictionary(myDfm, myDict, valuetype = "glob", case_insensitive = FALSE)

# regex v. glob format: note that "united_states" is a regex match for "tax*"
applyDictionary(myDfm, myDict, valuetype = "glob")
applyDictionary(myDfm, myDict, valuetype = "regex", case_insensitive = TRUE)

# fixed format: no pattern matching
applyDictionary(myDfm, myDict, valuetype = "fixed")
applyDictionary(myDfm, myDict, valuetype = "fixed", case_insensitive = FALSE)

Run the code above in your browser using DataLab