# keyness

##### Keywords

Given a frequency table (with texts as rows and words as columns),
this function calculates log-likelihood and log ratio of one set of rows against the other rows.
The return value is a list containing scores for each word. If the method
is `loglikelihood`

, the returned scores are unsigned G2 values. To estimate the
*direction* of the keyness, the `log ratio`

is more informative. A nice introduction
into log ratio can be found here.

##### Usage

```
keyness(
ft,
categories = c(1, rep(2, nrow(ft) - 1)),
epsilon = 1e-100,
siglevel = 0.05,
method = c("loglikelihood", "logratio"),
minimalFrequency = 10
)
```

##### Arguments

- ft
The frequency table

- categories
A factor or numeric vector that represents an assignment of categories.

- epsilon
null values are replaced by this value, in order to avoid division by zero

- siglevel
Return only the keywords above the significance level. Set to 1 to get all words

- method
Either "logratio" or "loglikelihood" (default)

- minimalFrequency
Words less frequent than this value are not considered at all

##### Value

A list of keywords, sorted by their log-likelihood or log ratio value, calculated according to http://ucrel.lancs.ac.uk/llwizard.html.

##### Examples

```
# NOT RUN {
data("rksp.0")
ft <- frequencytable(rksp.0, byCharacter = TRUE, normalize = FALSE)
# Calculate log ratio for all words
genders <- factor(c("m", "m", "m", "m", "f", "m", "m", "m", "f", "m", "m", "f", "m"))
keywords <- keyness(ft, method = "logratio",
categories = genders,
minimalFrequency = 5)
# Remove words that are not significantly different
keywords <- keywords[names(keywords) %in% names(keyness(ft, siglevel = 0.01))]
# }
```

*Documentation reproduced from package DramaAnalysis, version 3.0.1, License: GPL (>= 3)*