Learn R Programming

⚠️There's a newer version (0.13-8) of this package.Take me there.

koRpus (version 0.04-27)

An R Package for Text Analysis

Description

A set of tools to analyze texts. Includes, amongst others, functions for automatic language detection, hyphenation, several indices of lexical diversity (e.g., type token ratio, HD-D/vocd-D, MTLD) and readability (e.g., Flesch, SMOG, LIX, Dale-Chall). Basic import functions for language corpora are also provided, to enable frequency analyses (supports Celex and Leipzig Corpora Collection file formats). #' Note: For full functionality a local installation of TreeTagger is recommended. Be encouraged to send feedback to the author(s)!

Copy Link

Version

Install

install.packages('koRpus')

Monthly Downloads

4,316

Version

0.04-27

License

GPL (>= 3)

Maintainer

Meik Michalke

Last Published

March 8th, 2012

Functions in koRpus (0.04-27)

lex.div

Analyze lexical diversity
FOG

Readability: Gunning FOG Index
hyph.XX

Hyphenation patterns: hyph.XX
DRP

Readability: Degrees of Reading Power (DRP)
MSTTR

Lexical diversity: Mean Segmental Type-Token Ratio (MSTTR)
R.ld

Lexical diversity: Guiraud's R
TTR

Lexical diversity: Type-Token Ratio
RIX

Readability: Anderson's Readability Index (RIX)
MTLD

Lexical diversity: Measure of Textual Lexical Diversity (MTLD)
C.ld

Lexical diversity: Herdan's C
flesch.kincaid

Readability: Flesch-Kincaid Grade Level
U.ld

Lexical diversity: Uber Index (U)
fucks

Readability: Fucks' Stilcharakteristik
linsear.write

Readability: Linsear Write Index
SMOG

Readability: Simple Measure of Gobbledygook (SMOG)
wheeler.smith

Readability: Wheeler-Smith Score
kRp.readability-class

S4 class kRp.readability
kRp.TTR-class

S4 class kRp.TTR
kRp.text.transform

Letter case transformation
ELF

Readability: Farr's Easy Listening Formula (ELF)
kRp.tagged-class

S4 class kRp.tagged
kRp.hyphen-class

S4 class kRp.hyphen
farr.jenkins.paterson

Readability: Farr-Jenkins-Paterson Index
kRp.corp.freq-class

S4 class kRp.corp.freq
treetag

A function to call TreeTagger
HDD

Lexical diversity: HD-D (vocd-d)
kRp.hyph.pat-class

S4 class kRp.hyph.pat
show

Show methods for objects of class kRp.lang
maas

Lexical diversity: Maas' indices
harris.jacobson

Readability: Harris-Jacobson indices
spache

Readability: Spache Formula
set.kRp.env

A function to set information on your koRpus environmenton
kRp.text.paste

Paste koRpus objects
kRp.analysis-class

S4 class kRp.analysis
dale.chall

Readability: Dale-Chall Readability Formula
nWS

Readability: Neue Wiener Sachtextformeln
strain

Readability: Strain Index
correct.tag

Methods to correct koRpus objects
guess.lang

Guess language a text is written in
readability.num

Calculate readability
kRp.filter.wclass

Remove word classes
K.ld

Lexical diversity: Yule's K
plot

Plot method for objects of class kRp.tagged
flesch

Readability: Flesch Readability Ease
TRI

Readability: Kuntzsch's Text-Redundanz-Index
kRp.txt.trans-class

S4 class kRp.txt.trans
textFeatures

Extract text features for authorship analysis
read.corp.custom

Import custom corpus data
bormuth

Readability: Bormuth's Mean Cloze and Grade Placement
koRpus-package

An R Package for Text Analysis.
query

A method to get information out of koRpus objects
LIX

Readability: Bj"ornsson's L"asbarhetsindex (LIX)
manage.hyph.pat

Handling hyphenation pattern objects
coleman

Readability: Coleman's Formulas
coleman.liau

Readability: Coleman-Liau Index
lex.div.num

Calculate lexical diversity
read.corp.celex

Import Celex data
get.kRp.env

Get koRpus session environment
S.ld

Lexical diversity: Summer's S
tokenize

A simple tokenizer
readability

Measure readability
dickes.steiwer

Readability: Dickes-Steiwer Handformel
kRp.cluster

Work in (early) progress. Probably don't even look at it. Consider it pure magic that is not to be tempered with.
summary

Summary method for objects of class kRp.lang
kRp.text.analysis

Analyze texts using TreeTagger and word frequencies
FORCAST

Readability: FORCAST Index
kRp.freq.analysis

Analyze word frequencies
kRp.txt.freq-class

S4 class kRp.txt.freq
kRp.lang-class

S4 class kRp.lang
segment.optimizer

A function to optimize MSTTR segment sizes
traenkle.bailer

Readability: Traenkle-Bailer Formeln
CTTR

Lexical diversity: Carroll's corrected TTR (CTTR)
kRp.POS.tags

Get elaborated word tag definitions
MATTR

Lexical diversity: Moving-Average Type-Token Ratio (MATTR)
ARI

Readability: Automated Readability Index (ARI)
read.hyph.pat

Reading patgen-compatible hyphenation pattern files
read.corp.LCC

Import LCC data
hyphen

Automatic hyphenation
danielson.bryan

Readability: Danielson-Bryan