Learn R Programming

gghinton (version 0.1.0)

alice_bigrams: English character bigram counts from Alice's Adventures in Wonderland

Description

A 27x27 integer matrix of character-pair (bigram) counts computed from the full text of Alice's Adventures in Wonderland by Lewis Carroll (1865). The source text is Project Gutenberg item 11 (public domain).

Usage

alice_bigrams

Arguments

Format

A 27 x 27 integer matrix. Row names and column names are c(letters, " ").

Details

The 27 characters are the 26 lower-case letters a-z plus a space character (represented as " "). Non-letter characters in the original text (punctuation, digits, newlines) are ignored, and runs of multiple spaces are collapsed to one before counting.

alice_bigrams[x, y] gives the number of times character x is immediately followed by character y in the processed text.

Examples

Run this code
# Most common bigrams
tail(sort(alice_bigrams), 10)

# "th" count
alice_bigrams["t", "h"]

# Visualise as a Hinton diagram
df <- matrix_to_hinton(alice_bigrams / sum(alice_bigrams))
# \donttest{
ggplot2::ggplot(df, ggplot2::aes(x = col, y = row, weight = weight)) +
  geom_hinton() +
  scale_fill_hinton() +
  ggplot2::coord_fixed() +
  theme_hinton()
# }

Run the code above in your browser using DataLab