List the most (or least) frequently occurring features in a dfm, either as a whole or separated by document.
topfeatures(x, n = 10, decreasing = TRUE, scheme = c("count", "docfreq"),
groups = NULL)
the object whose features will be returned
how many top features should be returned
If TRUE
, return the n
most frequent features;
otherwise return the n
least frequent features
one of count
for total feature frequency (within
group
if applicable), or docfreq
for the document frequencies
of features
either: a character vector containing the names of document variables to be used for grouping; or a factor or object that can be coerced into a factor equal in length or rows to the number of documents. See groups for details.
A named numeric vector of feature counts, where the names are the
feature labels, or a list of these if groups
is given.
# NOT RUN {
mydfm <- corpus_subset(data_corpus_inaugural, Year > 1980) %>%
dfm(remove_punct = TRUE)
mydfm_nostopw <- dfm_remove(mydfm, stopwords("english"))
# most frequent features
topfeatures(mydfm)
topfeatures(mydfm_nostopw)
# least frequent features
topfeatures(mydfm_nostopw, decreasing = FALSE)
# top features of individual documents
topfeatures(mydfm_nostopw, n = 5, groups = docnames(mydfm_nostopw))
# grouping by president last name
topfeatures(mydfm_nostopw, n = 5, groups = "President")
# features by document frequencies
tail(topfeatures(mydfm, scheme = "docfreq", n = 200))
# }
Run the code above in your browser using DataLab