sylcount (version 0.2-1)

readability: readability

Description

Computes some basic "readability" measurements, includeing Flesch Reading Ease, Flesch-Kincaid grade level, Automatic Readability Index, and the Simple Measure of Gobbledygook. The function is vectorized by document, and scores are computed in parallel via OpenMP. You can control the number of threads used with the nthreads parameter.

The function will have some difficulty on poorly processed and cleaned data. For example, if all punctuation is stripped out, then the number of sentences detected will always be zero. However, we do recommend removing quotes (single and double), as contractions can confuse the parser.

Usage

readability(s, nthreads = sylcount.nthreads())

Arguments

s

A character vector (vector of strings).

nthreads

Number of threads to use. By default it will use the total number of cores + hyperthreads.

Value

A dataframe containing:

chars the total numberof characters
wordchars the number of alphanumeric characters
words text tokens that are probably English language words
nonwords text tokens that are probably not English language words
sents the number of sentences recognized in the text
sylls the total number of syllables (ignores all non-words)
polys the total number of "polysyllables", or words with 3+ syllables
re Flesch reading ease score
gl Flesch-Kincaid grade level score
ari Automatic Readability Index score
smog Simple Measure of Gobbledygook (SMOG) score
cl the Coleman-Liau Index score

Details

The return is split into words and non-words. A non-word is some block of text more than 64 characters long with no spaces or sentence-ending punctuation inbetween. The number of non-words is returned mostly for error-checking/debugging purposes. If you have a lot of non-words, you probably didn't clean your text properly. The word/non-word division is made in an attempt to improve run-time and memory performance.

For implementation details, see the Details section of ?sylcount.

References

Kincaid, J. Peter, et al. Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. No. RBR-8-75. Naval Technical Training Command Millington TN Research Branch, 1975.

Senter, R. J., and Edgar A. Smith. Automated readability index. CINCINNATI UNIV OH, 1967.

Mc Laughlin, G. Harry. "SMOG grading-a new readability formula." Journal of reading 12.8 (1969): 639-646.

Coleman, Meri, and Ta Lin Liau. "A computer readability formula designed for machine scoring." Journal of Applied Psychology 60.2 (1975): 283.

See Also

doc_counts

Examples

Run this code
# NOT RUN {
library(sylcount)
a <- "I am the very model of a modern major general."
b <- "I have information vegetable, animal, and mineral."

# One or the other
readability(a, nthreads=1)
readability(b, nthreads=1)

# Bot at once as separate documents.
readability(c(a, b), nthreads=1)
# And as a single document.
readability(paste0(a, b, collapse=" "), nthreads=1)

# }

Run the code above in your browser using DataCamp Workspace