A set of tools to analyze texts. Includes, amongst others, functions for automatic language detection, hyphenation, several indices of lexical diversity (e.g., type token ratio, HD-D/vocd-D, MTLD) and readability (e.g., Flesch, SMOG, LIX, Dale-Chall). Basic import functions for language corpora are also provided, to enable frequency analyses (supports Celex and Leipzig Corpora Collection file formats).
Note: For full functionality a local installation of TreeTagger is recommended. Also, due to some restrictions on CRAN, the full package sources are only available from the project homepage. Be encouraged to send feedback to the author(s)!