Returns a count of the number of syllables in texts. For English
words, the syllable count is exact and looked up from the CMU pronunciation
dictionary, from the default syllable dictionary data_int_syllables
.
For any word not in the dictionary, the syllable count is estimated by
counting vowel clusters.
data_int_syllables
is a quanteda-supplied data object consisting of a
named numeric vector of syllable counts for the words used as names. This
is the default object used to count English syllables. This object that
can be accessed directly, but we strongly encourage you to access it only
through the nsyllable()
wrapper function.
nsyllable(x, syllable_dictionary = quanteda::data_int_syllables,
use.names = FALSE)
character vector or tokens
object whose
syllables will be counted. This will count all syllables in a character
vector without regard to separating tokens, so it is recommended that x be
individual terms.
optional named integer vector of syllable counts where
the names are lower case tokens. When set to NULL
(default), then
the function will use the quanteda data object data_int_syllables
, an
English pronunciation dictionary from CMU.
logical; if TRUE
, assign the tokens as the names of
the syllable count vector
If x
is a character vector, a named numeric vector of the
counts of the syllables in each element. If x
is a tokens
object, return a list of syllable counts where each list element corresponds
to the tokens in a document.
# NOT RUN {
# character
nsyllable(c("cat", "syllable", "supercalifragilisticexpialidocious",
"Brexit", "Administration"), use.names = TRUE)
# tokens
txt <- c(doc1 = "This is an example sentence.",
doc2 = "Another of two sample sentences.")
nsyllable(tokens(txt, remove_punct = TRUE))
# punctuation is not counted
nsyllable(tokens(txt), use.names = TRUE)
# }
Run the code above in your browser using DataLab