# textstat_keyness

##### Calculate keyness statistics

Calculate "keyness", a score for features that occur differentially across different categories. Here, the categories are defined by reference to a "target" document index in the dfm, with the reference group consisting of all other documents.

- Keywords
- textstat

##### Usage

```
textstat_keyness(x, target = 1L, measure = c("chi2", "exact", "lr", "pmi"),
sort = TRUE, correction = c("default", "yates", "williams", "none"))
```

##### Arguments

- x
a dfm containing the features to be examined for keyness

- target
the document index (numeric, character or logical) identifying the document forming the "target" for computing keyness; all other documents' feature frequencies will be combined for use as a reference

- measure
(signed) association measure to be used for computing keyness. Currently available:

`"chi2"`

;`"exact"`

(Fisher's exact test);`"lr"`

for the likelihood ratio;`"pmi"`

for pointwise mutual information.- sort
logical; if

`TRUE`

sort features scored in descending order of the measure, otherwise leave in original feature order- correction
if

`"default"`

, Yates correction is applied to`"chi2"`

; William's correction is applied to`"lr"`

; and no correction is applied for the`"exact"`

and`"pmi"`

measures. Specifying a value other than the default can be used to override the defaults, for instance to apply the Williams correction to the chi2 measure. Specifying a correction for the`"exact"`

and`"pmi"`

measures has no effect and produces a warning.

##### Value

a data.frame of computed statistics and associated p-values, where
the features scored name each row, and the number of occurrences for both
the target and reference groups. For `measure = "chi2"`

this is the
chi-squared value, signed positively if the observed value in the target
exceeds its expected value; for `measure = "exact"`

this is the
estimate of the odds ratio; for `measure = "lr"`

this is the
likelihood ratio \(G2\) statistic; for `"pmi"`

this is the pointwise
mutual information statistics.

`textstat_keyness`

returns a data.frame of features and
their keyness scores and frequency counts.

##### References

Bondi, Marina, and Mike Scott, eds. 2010. *Keyness in
Texts*. Amsterdam, Philadelphia: John Benjamins, 2010.

Stubbs, Michael. 2010. "Three Concepts of Keywords". In *Keyness in
Texts*, Marina Bondi and Mike Scott, eds. pp21<U+2013>42. Amsterdam, Philadelphia:
John Benjamins.

Scott, M. & Tribble, C. 2006. *Textual Patterns: keyword and corpus
analysis in language education*. Amsterdam: Benjamins, p. 55.

Dunning, Ted. 1993. "Accurate Methods for the Statistics of Surprise and
Coincidence", *Computational Linguistics*, Vol 19, No. 1, pp. 61-74.

##### Examples

```
# NOT RUN {
# compare pre- v. post-war terms using grouping
period <- ifelse(docvars(data_corpus_inaugural, "Year") < 1945, "pre-war", "post-war")
mydfm <- dfm(data_corpus_inaugural, groups = period)
head(mydfm) # make sure 'post-war' is in the first row
head(result <- textstat_keyness(mydfm), 10)
tail(result, 10)
# compare pre- v. post-war terms using logical vector
mydfm2 <- dfm(data_corpus_inaugural)
textstat_keyness(mydfm2, docvars(data_corpus_inaugural, "Year") >= 1945)
# compare Trump 2017 to other post-war preseidents
pwdfm <- dfm(corpus_subset(data_corpus_inaugural, period == "post-war"))
head(textstat_keyness(pwdfm, target = "2017-Trump"), 10)
# using the likelihood ratio method
head(textstat_keyness(dfm_smooth(pwdfm), measure = "lr", target = "2017-Trump"), 10)
# }
```

*Documentation reproduced from package quanteda, version 1.2.0, License: GPL-3*