Learn R Programming

lingmatch (version 1.0.7)

lma_dict: English Function Word Category and Special Character Lists

Description

Returns a list of function words based on the Linguistic Inquiry and Word Count 2015 dictionary (in terms of category names -- words were selected independently), or a list of special characters and patterns.

Usage

lma_dict(..., as.regex = TRUE, as.function = FALSE)

Value

A list with a vector of terms for each category, or (when as.function = TRUE) a function which accepts an initial "terms" argument (a character vector), and any additional arguments determined by function entered as as.function (grepl by default).

Arguments

...

Numbers or letters corresponding to category names: ppron, ipron, article, adverb, conj, prep, auxverb, negate, quant, interrog, number, interjection, or special.

as.regex

Logical: if FALSE, lists are returned without regular expression.

as.function

Logical or a function: if specified and as.regex is TRUE, the selected dictionary will be collapsed to a regex string (terms separated by |), and a function for matching characters to that string will be returned. The regex string is passed to the matching function (grepl by default) as a 'pattern' argument, with the first argument of the returned function being passed as an 'x' argument. See examples.

See Also

To score texts with these categories, use lma_termcat().

Examples

Run this code
# return the full dictionary (excluding special)
lma_dict()

# return the standard 7 category lsm categories
lma_dict(1:7)

# return just a few categories without regular expression
lma_dict(neg, ppron, aux, as.regex = FALSE)

# return special specifically
lma_dict(special)

# returning a function
is.ppron <- lma_dict(ppron, as.function = TRUE)
is.ppron(c("i", "am", "you", "were"))

in.lsmcat <- lma_dict(1:7, as.function = TRUE)
in.lsmcat(c("a", "frog", "for", "me"))

## use as a stopword filter
is.stopword <- lma_dict(as.function = TRUE)
dtm <- lma_dtm("Most of these words might not be all that relevant.")
dtm[, !is.stopword(colnames(dtm))]

## use to replace special characters
clean <- lma_dict(special, as.function = gsub)
clean(c(
  "\u201Ccurly quotes\u201D", "na\u00EFve", "typographer\u2019s apostrophe",
  "en\u2013dash", "em\u2014dash"
))

Run the code above in your browser using DataLab