quanteda (version 0.9.6-1)

syllables: count syllables in a text

Description

Returns a count of the number of syllables in texts. For English words, the syllable count is exact and looked up from the CMU pronunciation dictionary, from the default syllable dictionary englishSyllables. For any word not in the dictionary, the syllable count is estimated by counting vowel clusters. englishSyllables is a quanteda-supplied data object consisting of a named numeric vector of syllable counts for the words used as names. This is the default object used to count English syllables. This object that can be accessed directly, but we strongly encourage you to access it only through the syllables() wrapper function.

Usage

syllables(x, ...)

## S3 method for class 'character': syllables(x, syllableDict = quanteda::englishSyllables, ...)

## S3 method for class 'tokenizedTexts': syllables(x, syllableDict = quanteda::englishSyllables, ...)

Arguments

x
character vector or tokenizedText-class object whose syllables will be counted
...
additional arguments passed to tokenize
syllableDict
optional named integer vector of syllable counts where the names are lower case tokens. When set to NULL (default), then the function will use the quanteda data object englishSyllables, an English pronunciation dictionary fro

Value

  • If x is a character vector, a named numeric vector of the counts of the syllables in each text, without tokenization. If x consists of (a list of) tokenized texts, then return a list of syllable counts corresponding to the tokenized texts.

source

englishSyllables is built from the freely available CMU pronunciation dictionary at http://www.speech.cs.cmu.edu/cgi-bin/cmudict.

Examples

Run this code
syllables("This is an example sentence.")
syllables(tokenize("This is an example sentence.", simplify=TRUE))
myTexts <- c(text1 = "Text one.", 
             text2 = "Superduper text number two.", 
             text3 = "One more for the road.")
syllables(myTexts)
syllables(tokenize(myTexts, removePunct = TRUE))
syllables("supercalifragilisticexpialidocious")

Run the code above in your browser using DataCamp Workspace