vgc.interp: Expected Vocabulary Growth by Binomial Interpolation (zipfR)

Description

vgc.interp computes the expected vocabulary growth curve for random sample taken from a data set described by the frequency spectrum object obj.

Usage

vgc.interp(obj, N, m.max=0, allow.extrapolation=FALSE)

Arguments

obj

an object of class spc, representing the frequency spectrum of the data set from which samples are taken

a vector of increasing non-negative integers specifying the sample sizes for the expected vocabulary size is calculated (as well as expected spectrum elements if requested)

m.max

an integer in the range \(1 \ldots 9\), specifying the number of spectrum elements to be included in the vocabulary growth curve (default: none)

allow.extrapolation

if TRUE, the requested sample sizes \(N\) may be larger than the sample size of the frequency spectrum obj, so that binomial extrapolation is performed. This obtion should be used with great caution (see EV.spc for details).

Value

An object of class vgc, representing the expected vocabulary growth curves for random samples taken from the data set described by obj. Data points will be generated for the specified sample sizes N.

Details

See the EV.spc manpage for more information, especially concerning binomial extrapolation.

Note that the result of vgc.interp is an object of class vgc (a vocabulary growth curve), but its input is an object of class spc (a frequency spectrum).

Examples

Run this code

# NOT RUN {
## load the Tiger PP expansion spectrum
## (sample size: about 91k tokens) 
data(TigerPP.spc)

## binomially interpolated curve
TigerPP.bin.vgc <- vgc.interp(TigerPP.spc,(1:100)*910)
summary(TigerPP.bin.vgc)

## let's also add growth of V_1 to V_5 and plot
TigerPP.bin.vgc <- vgc.interp(TigerPP.spc,(1:100)*910,m.max=5)
plot(TigerPP.bin.vgc,add.m=c(1:5))


# }

Run the code above in your browser using DataLab