
data_int_syllables
.
For any word not in the dictionary, the syllable count is estimated by
counting vowel clusters.
data_int_syllables
is a quanteda-supplied data object consisting of a
named numeric vector of syllable counts for the words used as names. This
is the default object used to count English syllables. This object that
can be accessed directly, but we strongly encourage you to access it only
through the nsyllable()
wrapper function.
nsyllable(x, syllable_dictionary = quanteda::data_int_syllables, use.names = FALSE)
tokens
object whose
syllables will be countedNULL
(default), then
the function will use the quanteda data object data_int_syllables
, an
English pronunciation dictionary from CMU.TRUE
, assign the tokens as the names of
the syllable count vectorx
is a character vector, a named numeric vector of the
counts of the syllables in each element. If x
is a tokens
object, return a list of syllable counts where each list element corresponds
to the tokens in a document.
# character
nsyllable(c("cat", "syllable", "supercalifragilisticexpialidocious",
"Brexit", "Administration"), use.names = TRUE)
# tokens
txt <- c(doc1 = "This is an example sentence.",
doc2 = "Another of two sample sentences.")
nsyllable(tokens(txt, removePunct = TRUE))
# punctuation is not counted
nsyllable(tokens(txt), use.names = TRUE)
Run the code above in your browser using DataLab