This function computes a z-score statistic for frequency counts, based on a normal approximation to the correct binomial distribution under the random sampling model.
z.score(k, n, p = 0.5, correct = TRUE)
frequency of a type in the corpus (or an integer vector of frequencies)
number of tokens in the corpus, i.e. sample size (or an integer vector specifying the sizes of different samples)
null hypothesis, giving the assumed proportion of this type in the population (or a vector of proportions for different types and/or different populations)
if TRUE
, apply Yates' continuity correction
(default)
The \(z\)-score corresponding to the specified data (or a vector of \(z\)-scores).
The \(z\) statistic is given by $$% z := \frac{k - np}{\sqrt{n p (1-p)}} $$ When Yates' continuity correction is enabled, the absolute value of the numerator \(d := k - np\) is reduced by \(1/2\), but clamped to a non-negative value.