Learn R Programming

languageR (version 1.0)

growth.fnc: Calculate vocabulary growth curve and vocabulary richness measures

Description

This function calculates, for an increasing sequence of text sizes, the observed number of types, hapax legomena, dis legomena, tris legomena, and selected measures of lexical richness.

Usage

growth.fnc(text = alice, size = 646, nchunks = 40, chunks = 0)

Arguments

text
A vector of strings representing a text.
size
An integer giving the size of a text chunk when the text is to be split into a series of equally-sized text chunks.
nchunks
An integer denoting the number of desired equally-sized text chunks.
chunks
An integer vector denoting the token sizes for which growth measures are required. When chunks is specified, size and nchunks are ignored.

Value

  • A growth object with methods for plotting, printing. As running this function on large texts may take some time, a period is printed on the output device for each completed chunk to indicate progress.

    The data frame with the actual measures, which can be extracted with object.name@data$data, has the following columns.

  • Chunka numeric vector with chunk numbers.
  • Tokensa numeric vector with the number of tokens up to and including the current chunk.
  • Typesa numeric vector with the number of types up to and including the current chunk.
  • HapaxLegomenaa numeric vector with the corresponding count of hapax legomena.
  • DisLegomenaa numeric vector with the corresponding count of dis legomena.
  • TrisLegomenaa numeric vector with the corresponding count of tris legomena.
  • Yulea numeric vector with Yule's K.
  • Zipfa numeric vector with the slope of Zipf's rank-frequency curve in the double-logarithmic plane.
  • TypeTokenRatioa numeric vector with the ratio of types to tokens.
  • Herdana numeric vector with Herdan's C.
  • Guirauda numeric vector with Guiraud's R.
  • Sichela numeric vector with Sichel's S.
  • Lognormala numeric vector with mean log frequency.

References

R. H. Baayen (2001) Word Frequency Distributions, Dordrecht: Kluwer Academic Publishers.

Tweedie, F. J. & Baayen, R. H. (1998) How variable may a constant be? Measures of lexical richness in perspective, Computers and the Humanities, 32, 323-352.

See Also

See Also plot.growth, and the zipfR package.

Examples

Run this code
data(alice)
  alice.growth = growth.fnc(alice)
  plot(alice.growth)

Run the code above in your browser using DataLab