Learn R Programming

⚠️There's a newer version (0.7) of this package.Take me there.

corpora (version 0.5)

Statistics and Data Sets for Corpus Frequency Data

Description

Utility functions for the statistical analysis of corpus frequency data. This package is a companion to the open-source course "Statistical Inference: A Gentle Introduction for Computational Linguists and Similar Creatures" ('SIGIL').

Copy Link

Version

Install

install.packages('corpora')

Monthly Downloads

590

Version

0.5

License

GPL-3

Maintainer

Stefan Evert

Last Published

August 31st, 2018

Functions in corpora (0.5)

PassiveBrownFam

By-text frequencies of passive verb phrases in the Brown Family corpora.
z.score.pval

P-values of the z-score test for frequency counts (corpora)
VSS

A small corpus of very short stories with linguistic annotations
fisher.pval

P-values of Fisher's exact test for frequency comparisons (corpora)
BNCInChargeOf

Collocations of the phrase "in charge of" (BNC)
BNCbiber

Biber's (1988) register features for the British National Corpus
chisq.pval

P-values of Pearson's chi-squared test for frequency comparisons (corpora)
cont.table

Build contingency tables for frequency comparison (corpora)
prop.cint

Confidence interval for proportion based on frequency counts (corpora)
BrownPassives

Frequency counts of passive verb phrases in the Brown corpus
corpora-package

corpora: Statistical Inference from Corpus Frequency Data
BrownStats

Basic statistics of texts in the Brown corpus
BNCmeta

Metadata for the British National Corpus (XML edition)
corpora.palette

Colour palettes for linguistic visualization (corpora)
binom.pval

P-values of the binomial test for frequency counts (corpora)
BNCqueries

Per-text frequency counts for a selection of BNCweb corpus queries
DistFeatBrownFam

Latent dimension scores from a distributional analysis of the Brown Family corpora
chisq

Pearson's chi-squared statistic for frequency comparisons (corpora)
KrennPPV

German PP-Verb collocation candidates annotated by Brigitte Krenn (2000)
BrownBigrams

Bigrams of adjacent words from the Brown corpus
BrownLOBPassives

Frequency counts of passive verb phrases in the Brown and LOB corpora
sample.df

Random samples from data frames (corpora)
simulated.census

Simulated census data for examples and illustrations (corpora)
qw

Split string into words, similar to qw() in Perl (corpora)
simulated.language.course

Simulated study on effectiveness of language course (corpora)
rowColVector

Propagate vector to single-row or single-column matrix (corpora)
simulated.wikipedia

Simulated type and token counts for Wikipedia articles (corpora)
stars.pval

Show p-values as significance stars (corpora)
z.score

The z-score statistic for frequency counts (corpora)
BNCcomparison

Comparison of written and spoken noun frequencies in the British National Corpus
BNCdomains

Distribution of domains in the British National Corpus (BNC)
LOBPassives

Frequency counts of passive verb phrases in the LOB corpus
LOBStats

Basic statistics of texts in the LOB corpus