syllable v0.1.2

0

Monthly downloads

0th

Percentile

by Tyler Rinker

A Small Collection of Syllable Counting Functions

Tools for counting syllables and polysyllables. The tools rely primarily on a 'data.table' hash table lookup, resulting in fast syllable counting.

Readme

syllable Follow

Project Status: Active - The project has reached a stable, usable
state and is being actively
developed. Build
Status Coverage
Status DOI Version

qdapRegex Logo

syllable is a small collection of tools for counting syllables and polysyllables. The tools rely primarily on data.table hash table lookups, resulting in fast syllable counting.

Table of Contents

Main Functions

The main functions follow the format of action_object.

Actions

The following table outlines the actions. Example Output correspond to this string: "I like chicken sandwiches.".

Action Description Returns Example Output
count One integer per word A vector per string 1, 1, 2, 3
sum Sum of syllable counts An integer per string 7
tally* Sum of syllable attributes An integer per string pollysyllable tallies = 1

* The addition of _mono, _di, _poly _short (monosyllabic + disyllabic), or _both (short & pollysyllabic) to tally allows the user specify what syllable attribute is being tallied.

Objects

The following table outlines the objects acted upon:

Object Description Example
string A character string "I like chicken sandwiches."
vector* A vector of character strings c("I like it.", "Look out!")

* The addition of _by to vector allows the user to aggregate by one or more vectors of grouping variables.

Putting It Together

The function count_vector will provide a vector of integer counts for each word in a string. For this reason count_vector will return a list of integer vector counts.

count_vector(c("I like it.", "Look out!"))

## $`1`
## [1] 1 1 1
## 
## $`2`
## [1] 1 1

Each of the main functions is optimized to do its task efficiently. While one could use sum(count_vector(x)) and achieve the same results as sum_vector(x) it would be less efficient.

The available syllable functions that follow the format of action_object are:

count_string

tally_both_string

tally_mono_string

tally_short_string

count_vector

tally_both_vector

tally_mono_vector

tally_short_vector

count_vector_by

tally_both_vector_by

tally_mono_vector_by

tally_short_vector_by

sum_string

tally_di_string

tally_poly_string

sum_vector

tally_di_vector

tally_poly_vector

sum_vector_by

tally_di_vector_by

tally_poly_vector_by

Available Variable Functions

Installation

To download the development version of syllable:

Download the zip ball or tar ball, decompress and run R CMD INSTALL on it, or use the pacman package to install the development version:

if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh(
    'trinker/lexicon',
    'trinker/textclean',
    'trinker/textshape',
    'trinker/syllable'
)

Contact

You are welcome to:

Examples

The following examples demonstrate the functionality of a select sample of syllable functions.

Count Syllables In a String

Counts the number of syllables for each word in a string.

count_string("I like chicken and eggs for breakfast")

## [1] 1 1 2 1 1 1 2

Count Syllables In a Vector of Strings

sents <- c("I like chicken.", "I want eggs benidict for breakfast.")
count_vector(sents)

## $`1`
## [1] 1 1 2
## 
## $`2`
## [1] 1 1 1 3 1 2

Map(function(x, y) setNames(x, y),
   count_vector(sents),
   strsplit(gsub("[^a-z ]", "", tolower(sents)), "\\s+")
)

## $`1`
##       i    like chicken 
##       1       1       2 
## 
## $`2`
##         i      want      eggs  benidict       for breakfast 
##         1         1         1         3         1         2

Sum the Syllables In a Vector of Strings by Grouping Variable(s)

dat <- data.frame(
   text = c("I like chicken.", "I want eggs benedict for breakfast.", "Really?"),
   group = c("A", "B", "A")
)
sum_vector_by(dat$text, dat$group)

## # A tibble: 2 x 3
##    group n.words count
##   <fctr>   <int> <dbl>
## 1      A       4     7
## 2      B       6     9

Tally the Short/Poly-Syllabic Words by Group(s)

dat <- data.frame(
   text = c("I like excellent chicken.", "I want eggs benedict now.", "Really?"),
   group = c("A", "B", "A")
)
tally_both_vector_by(dat$text, dat$group)

## # A tibble: 2 x 4
##    group n.words short  poly
##   <fctr>   <int> <int> <int>
## 1      A       5     3     2
## 2      B       5     4     1

with(presidential_debates_2012, tally_both_vector_by(dialogue, person))

## # A tibble: 6 x 4
##      person n.words short  poly
##      <fctr>   <int> <int> <int>
## 1     OBAMA   18319 16286  2033
## 2    ROMNEY   19924 17858  2066
## 3   CROWLEY    1672  1525   147
## 4    LEHRER     765   674    91
## 5  QUESTION     583   486    97
## 6 SCHIEFFER    1445  1289   156

Readability Word Statistics by Grouping Variable(s)

with(presidential_debates_2012, readability_word_stats_by(dialogue, list(person, time)))

## # A tibble: 10 x 9
##       person   time n.sents n.words n.chars n.sylls n.shorts n.polys
## *     <fctr> <fctr>   <int>   <int>   <int>   <dbl>    <int>   <int>
## 1      OBAMA time 1     179    3599   16002    5221     3221     378
## 2      OBAMA time 2     494    7477   32459   10654     6696     781
## 3      OBAMA time 3     405    7243   32288   10675     6369     874
## 4     ROMNEY time 1     279    4085   17984    5875     3646     439
## 5     ROMNEY time 2     560    7536   32504   10720     6788     748
## 6     ROMNEY time 3     569    8303   35824   11883     7424     879
## 7    CROWLEY time 2     165    1672    6904    2308     1525     147
## 8     LEHRER time 1      87     765    3256    1087      674      91
## 9   QUESTION time 2      40     583    2765     930      486      97
## 10 SCHIEFFER time 3     133    1445    6234    2058     1289     156
## # ... with 1 more variables: n.complexes <int>

Visualize Poly Syllable Distributions

if (!require("pacman")) install.packages("pacman")
pacman::p_load(dplyr, ggplot2, scales)

tally_both_vector(presidential_debates_2012$dialogue) %>%
    mutate(Duration = 1:length(poly)) %>%
    rowwise() %>%
    filter((short + poly) > 4) %>%
    mutate(
        short = short/(short+poly),
        poly = 1 - short,
        size = poly > .3
    ) %>%
    ggplot(aes(Duration, poly)) +
        geom_text(aes(label = Duration, size = size, color = size)) +
        coord_flip() +
        scale_size_manual(values = c(1.5, 2.5), guide=FALSE) +
        scale_color_manual(values = c("grey75", "black"), guide=FALSE) +
        scale_x_reverse() +
        scale_y_continuous(label = scales::percent) +
        ylab("Poly-syllabic") +
        xlab("Duration (sentences)") +
        theme_bw() 

Visualize Poly Syllable Distributions by Group

if (!require("pacman")) install.packages("pacman")
pacman::p_load(dplyr, ggplot2, tidyr, scales)

with(presidential_debates_2012, tally_both_vector_by(dialogue, list(person, time))) %>%
    mutate(
        person_time = paste(person, time, sep = "-"),
        short = short/(short+poly),
        poly = 1 - short
    ) %>%
    arrange(poly) %>%
    mutate(person_time = factor(person_time, levels = person_time)) %>%
    gather(type, prop, c(short, poly)) %>%
    ggplot(aes(person_time, weight = prop, fill = type)) +
        geom_bar() +
        coord_flip() +        
        scale_y_continuous(label = scales::percent) +
        scale_fill_discrete(name="Syllable\nType") +
        xlab("Person & Time") +
        ylab("Usage") +
        theme_bw()

Functions in syllable

Name Description
common_polysyllabic_proper_nouns Common Proper Nouns
presidential_debates_2012 2012 U.S. Presidential Debates
lookup_syllable_counts Lookup Syllable Counts
readability_word_stats Readability Word Statistics
tally_both_vector_by Vector Tally of Short-Syllabic and Poly-Syllabic Words By Grouping Variable(s)
readability_word_stats_by Readability Word Statistics By Grouping Variable(s)
tally_both_string String Syllable Tally of Short-syllabic and Poly-syllabic Words
sum_vector_by Vector Syllable Sums By Grouping Variable(s)
sum_vector Vector Syllable Sums
tally_both_vector Vector Tally of Short-Syllabic and Poly-Syllabic Words
reexports Objects exported from other packages
tally_di_string String Syllable Tally of Disyllabic Words
sum_string String Syllables Sum
tally_mono_vector_by Vector Tally of Mono-Syllabic Words By Grouping Variable(s)
tally_mono_string String Syllable Tally of Monosyllabic Words
tally_di_vector_by Vector Tally of Di-Syllabic Words By Grouping Variable(s)
tally_poly_vector Vector Tally of Poly-syllabic Words
tally_di_vector Vector Tally of Di-syllabic Words
tally_poly_vector_by Vector Tally of Poly-Syllabic Words By Grouping Variable(s)
tally_short_vector Vector Tally of Short-syllabic and Poly-syllabic Words
tally_short_vector_by Vector Tally of Short-Syllabic Words By Grouping Variable(s)
tally_short_string String Syllable Tally of Non-Polysyllabic (< 3 syllables) Words
syllable_counts_data A data.table of Words and Syllable Counts
syllable A Small Collection of Syllable Counting Functions
tally_poly_string String Syllable Tally of Polysyllabic Words
tally_mono_vector Vector Tally of Mono-syllabic Words
avaible_syllable_funs Available Syllable Functions
print.available Prints an available Object.
count_vector_by Vector Syllable Counts By Grouping Variable(s)
hamlets_soliloquy Hamlet's Soliloquy
count_string String Syllable Counts
count_vector Vector Syllable Counts
compute_syllable_counts Compute Syllable Counts
No Results!

Last month downloads

Details

Date 2017-01-19
License GPL-2
LazyData TRUE
RoxygenNote 5.0.1
URL http://github.com/trinker/syllable
BugReports http://github.com/trinker/syllable/issues
NeedsCompilation no
Packaged 2017-01-20 03:13:49 UTC; Tyler
Repository CRAN
Date/Publication 2017-01-20 10:48:51

Include our badge in your README

[![Rdoc](http://www.rdocumentation.org/badges/version/syllable)](http://www.rdocumentation.org/packages/syllable)