Learn R Programming

quanteda (version 0.8.4-2)

selectFeatures: select features from an object

Description

This function selects or discards features from a dfm.variety of objects, such as tokenized texts, a dfm, or a list of collocations. The most common usage for removeFeatures will be to eliminate stop words from a text or text-based object, or to select only features from a list of regular expression.

Usage

selectFeatures(x, features, ...)

## S3 method for class 'dfm': selectFeatures(x, features = NULL, selection = c("keep", "remove"), valuetype = c("glob", "regex", "fixed"), case_insensitive = TRUE, verbose = TRUE, ...)

Arguments

x
object whose features will be selected
features
character vector of regex{regular expressions} definding the features to be selected, or a dictionary class object whose values will provide the features to be selected. If a dictionary class object, the values will be i
...
supplementary arguments passed to the underlying functions in stri_detect_regex. (This is how case_insensitive is passed, but you may wish to pass others.)
selection
whether to keep or remove the features
valuetype
how to interpret feature vector: fixed for words as is; "regex" for regular expressions; or "glob" for "glob"-style wildcard
case_insensitive
ignore the case of dictionary values if TRUE
verbose
if TRUE print message about how many features were removed

See Also

removeFeatures, trim

Examples

Run this code
myDfm <- dfm(c("My Christmas was ruined by your opposition tax plan.",
               "Does the United_States or Sweden have more progressive taxation?"),
             toLower = FALSE, verbose = FALSE)
mydict <- dictionary(list(countries = c("United_States", "Sweden", "France"),
                          wordsEndingInY = c("by", "my"),
                          notintext = "blahblah"))
selectFeatures(myDfm, mydict)
selectFeatures(myDfm, mydict, case_insensitive = FALSE)
selectFeatures(myDfm, c("s$", ".y"), "keep", valuetype = "regex")
selectFeatures(myDfm, c("s$", ".y"), "remove", valuetype = "regex")
selectFeatures(myDfm, stopwords("english"), "keep", valuetype = "fixed")
selectFeatures(myDfm, stopwords("english"), "remove", valuetype = "fixed")

Run the code above in your browser using DataLab