Learn R Programming

corpora (version 0.7)

Statistics and Data Sets for Corpus Frequency Data

Description

Utility functions for the statistical analysis of corpus frequency data. This package is a companion to the open-source course "Statistical Inference: A Gentle Introduction for Computational Linguists and Similar Creatures" ('SIGIL').

Copy Link

Version

Install

install.packages('corpora')

Monthly Downloads

300

Version

0.7

License

GPL-3

Maintainer

Stephanie Evert

Last Published

June 10th, 2025

Functions in corpora (0.7)

binom.pval

P-values of the binomial test for frequency counts (corpora)
DistFeatBrownFam

Latent dimension scores from a distributional analysis of the Brown Family corpora
KrennPPV

German PP-Verb collocation candidates annotated by Brigitte Krenn (2000)
VSS

A small corpus of very short stories with linguistic annotations
LOBStats

Basic statistics of texts in the LOB corpus
LOBPassives

Frequency counts of passive verb phrases in the LOB corpus
chisq

Pearson's chi-squared statistic for frequency comparisons (corpora)
SurfaceColloc

A small data set of surface collocations from the English Wikipedia
PassiveBrownFam

By-text frequencies of passive verb phrases in the Brown Family corpora.
am.score

Compute association scores for collocation analysis (corpora)
qw

Split string into words, similar to qw() in Perl (corpora)
fisher.pval

P-values of Fisher's exact test for frequency comparisons (corpora)
rowColVector

Propagate vector to single-row or single-column matrix (corpora)
sample.df

Random samples from data frames (corpora)
corpora.palette

Colour palettes for linguistic visualization (corpora)
corpora-package

corpora: Statistical Inference from Corpus Frequency Data
prop.cint

Confidence interval for proportion based on frequency counts (corpora)
keyness

Compute best-practice keyness measures (corpora)
cont.table

Build contingency tables for frequency comparison (corpora)
stars.pval

Show p-values as significance stars (corpora)
simulated.wikipedia

Simulated type and token counts for Wikipedia articles (corpora)
simulated.census

Simulated census data for examples and illustrations (corpora)
simulated.language.course

Simulated study on effectiveness of language course (corpora)
chisq.pval

P-values of Pearson's chi-squared test for frequency comparisons (corpora)
z.score.pval

P-values of the z-score test for frequency counts (corpora)
z.score

The z-score statistic for frequency counts (corpora)
BNCInChargeOf

Collocations of the phrase "in charge of" (BNC)
BNCcomparison

Comparison of written and spoken noun frequencies in the British National Corpus
BrownStats

Basic statistics of texts in the Brown corpus
BrownLOBPassives

Frequency counts of passive verb phrases in the Brown and LOB corpora
BNCbiber

Biber's (1988) register features for the British National Corpus
BrownBigrams

Bigrams of adjacent words from the Brown corpus
BrownPassives

Frequency counts of passive verb phrases in the Brown corpus
BNCqueries

Per-text frequency counts for a selection of BNCweb corpus queries
BNCmeta

Metadata for the British National Corpus (XML edition)
BNCdomains

Distribution of domains in the British National Corpus (BNC)