keyness: Keywords

Description

Given a frequency table (with texts as rows and words as columns), this function calculates log-likelihood and log ratio of one set of rows against the other rows. The return value is a list containing scores for each word. If the method is loglikelihood, the returned scores are unsigned G2 values. To estimate the direction of the keyness, the log ratio is more informative. A nice introduction into log ratio can be found here.

Usage

keyness(ft, categories = c(1, rep(2, nrow(ft) - 1)), epsilon = 1e-100,
  siglevel = 0.05, method = c("loglikelihood", "logratio"),
  minimalFrequency = 10)

Arguments

The frequency table

Value

A list of keywords, sorted by their log-likelihood or log ratio value, calculated according to http://ucrel.lancs.ac.uk/llwizard.html.

Examples

Run this code

# NOT RUN {
data("rksp.0")
ft <- frequencytable(rksp.0, byCharacter = TRUE, normalize = FALSE)
# Calculate log ratio for all words
genders <- factor(c("m", "m", "m", "m", "f", "m", "m", "m", "f", "m", "m", "f", "m"))
keywords <- keyness(ft, method = "logratio", 
                    categories = genders, 
                    minimalFrequency = 5)
# Remove words that are not significantly different
keywords <- keywords[names(keywords) %in% names(keyness(ft, siglevel = 0.01))]

# }

Run the code above in your browser using DataLab