zipfR (version 0.6-66)

spc: Frequency Spectra (zipfR)

Description

In the zipfR library, spc objects are used to represent a word frequency spectrum (either an observed spectrum or the expected spectrum of a LNRE model at a given sample size).

With the spc constructor function, an object can be initialized directly from the specified data vectors. It is more common to read an observed spectrum from a disk file with read.spc or compute an expected spectrum with lnre.spc, though.

spc objects should always be treated as read-only.

Usage

spc(Vm, m=1:length(Vm), VVm=NULL, N=NA, V=NA, VV=NA,
      m.max=0, expected=!missing(VVm))

Arguments

m

integer vector of frequency classes \(m\) (if omitted, Vm is assumed to list the first \(k\) frequency classes \(V_1, \ldots, V_k\))

Vm

vector of corresponding class sizes \(V_m\) (may be fractional for expected frequency spectrum \(E[V_m]\))

VVm

optional vector of estimated variances \(\mathop{Var}[V_m]\) (for expected frequency spectrum only)

N, V

total sample size \(N\) and vocabulary size \(V\) of frequency spectrum. While these values are usually determined automatically from m and Vm, they are required for an incomplete frequency spectrum that does not list all non-empty frequency classes.

VV

variance \(\mathop{Var}[V]\) of expected vocabulary size. If VVm is specified, VV should also be given.

m.max

highest frequency class \(m\) listed in incomplete spectrum. If m.max is set, N and V also have to be specified, and all non-zero frequency classes up to m.max have to be included in the input vectors. Frequency classes above m.max in the input will automatically be deleted.

expected

set to TRUE if the frequency spectrum represents expected values \(E[V_m]\) of the class sizes according to some LNRE model (this is automatically triggered when the VVm argument is specified).

Value

An object of class spc representing the specified frequency spectrum. This object should be treated as read-only (although such behaviour cannot be enforced in R).

Details

A spc object is a data frame with the following variables:

m

frequency class \(m\), an integer vector

Vm

class size, i.e. number \(V_m\) of types in frequency class \(m\) (either observed class size from a sample or expected class size \(E[V_m]\) based on a LNRE model)

VVm

optional: estimated variance \(V[V_m]\) of expected class size (only meaningful for expected spectrum derived from LNRE model)

The following attributes are used to store additional information about the frequency spectrum:

m.max

if non-zero, the frequency spectrum is incomplete and lists only frequency classes up to m.max

N, V

sample size \(N\) and vocabulary size \(V\) of the frequency spectrum. For a complete frequency spectrum, these values could easily be determined from m and Vm, but they are essential for an incomplete spectrum.

VV

variance of expected vocabulary size; only present if hasVariances is TRUE. Note that VV may have the value NA is the user failed to specify it.

expected

if TRUE, frequency spectrum lists expected class sizes \(E[V_m]\) (rather than observed sizes \(V_m\)). Note that the VVm variable is only allowed for an expected frequency spectrum.

hasVariances

indicates whether or not the VVm variable is present

See Also

read.spc, write.spc, spc.vector, sample.spc, spc2tfl, tfl2spc, lnre.spc, plot.spc

Generic methods supported by spc objects are print, summary, N, V, Vm, VV, and VVm.

Implementation details and non-standard arguments for these methods can be found on the manpages print.spc, summary.spc, N.spc, V.spc, etc.

Examples

Run this code
# NOT RUN {
## load Brown imaginative prose spectrum and inspect it
data(BrownImag.spc)

summary(BrownImag.spc)
print(BrownImag.spc)

plot(BrownImag.spc)

N(BrownImag.spc)
V(BrownImag.spc)
Vm(BrownImag.spc,1)
Vm(BrownImag.spc,1:5)

## compute ZM model, and generate PARTIAL expected spectrum
## with variances for a sample of 10 million tokens
zm <- lnre("zm",BrownImag.spc)
zm.spc <- lnre.spc(zm,1e+7,variances=TRUE)

## inspect extrapolated spectrum
summary(zm.spc)
print(zm.spc)

plot(zm.spc,log="x")

N(zm.spc)
V(zm.spc)
VV(zm.spc)
Vm(zm.spc,1)
VVm(zm.spc,1)

## generate an artificial Zipfian-looking spectrum
## and take a look at it
zipf.spc <- spc(round(1000/(1:1000)^2))

summary(zipf.spc)
plot(zipf.spc)

## see manpages of lnre, and the various *.spc mapages
## for more examples of spc usage

# }

Run the code above in your browser using DataLab