zipfR (version 0.6-66)

EV-EVm.spc: Binomial Interpolation (zipfR)

Description

Compute the expected vocabulary size \(E[V(N)]\) (with function EV.spc) or expected frequency spectrum \(E[V_m(N)]\) (with function EVm.spc) for a random sample of size \(N\) from a given frequency spectrum (i.e., an object of class spc). The expectations are calculated by binomial interpolation (following Baayen 2001, pp. 64-69).

Note that these functions are not user-visible. They can be called implicitly through the generic methods EV and EVm, applied to an object of type spc.

Usage

# S3 method for spc
EV(obj, N, allow.extrapolation=FALSE, ...)

# S3 method for spc EVm(obj, m, N, allow.extrapolation=FALSE, ...)

Arguments

obj

an object of class spc, representing a frequency spectrum

m

positive integer value determining the frequency class \(m\) for which \(E[V_m(N)]\) be returned (or a vector of such values)

N

sample size \(N\) for which the expected vocabulary size or frequency spectrum are calculated (or a vector of sample sizes)

allow.extrapolation

if TRUE, the requested sample size \(N\) may be larger than the sample size of the frequency spectrum obj, for binomial extrapolation. This obtion should be used with great caution (see "Details" below).

...

additional arguments passed on from generic methods will be ignored

Value

EV returns the expected vocabulary size \(E[V(N)]\) for a random sample of \(N\) tokens from the frequency spectrum obj, and EVm returns the expected spectrum elements \(E[V_m(N)]\) for a random sample of \(N\) tokens from obj, calculated by binomial interpolation.

Details

These functions are naive implementations of binomial interpolation, using Equations (2.41) and (2.43) from Baayen (2001). No guarantees are made concerning their numerical accuracy, especially for extreme values of \(m\) and \(N\).

According to Baayen (2001), pp. 69-73., the same equations can also be used for binomial extrapolation of a given frequency spectrum to larger sample sizes. However, they become numerically unstable in this case and will typically break down when extrapolating to more than twice the size of the observed sample (Baayen 2001, p. 75). Therefore, extrapolation has to be enabled explicitly with the option allow.extrapolation=TRUE and should be used with great caution.

References

Baayen, R. Harald (2001). Word Frequency Distributions. Kluwer, Dordrecht.

See Also

EV and EVm for the generic methods and links to other implementations

spc.interp and vgc.interp are convenience functions that compute an expected frequency spectrum or vocabulary growth curve by binomial interpolation