Learn R Programming

⚠️There's a newer version (0.13-9) of this package.Take me there.

koRpus (version 0.05-4)

An R Package for Text Analysis

Description

A set of tools to analyze texts. Includes, amongst others, functions for automatic language detection, hyphenation, several indices of lexical diversity (e.g., type token ratio, HD-D/vocd-D, MTLD) and readability (e.g., Flesch, SMOG, LIX, Dale-Chall). Basic import functions for language corpora are also provided, to enable frequency analyses (supports Celex and Leipzig Corpora Collection file formats). #' Note: For full functionality a local installation of TreeTagger is recommended. Also, due to some restrictions on CRAN, the full package sources are only available from the project homepage. Be encouraged to send feedback to the author(s)!

Copy Link

Version

Install

install.packages('koRpus')

Monthly Downloads

4,086

Version

0.05-4

License

GPL (>= 3)

Maintainer

Meik Michalke

Last Published

January 22nd, 2014

Functions in koRpus (0.05-4)

kRp.cluster

Work in (early) progress. Probably don't even look at it. Consider it pure magic that is not to be tempered with.
LIX

Readability: Bj"ornsson's L"asbarhetsindex (LIX)
C.ld

Lexical diversity: Herdan's C
HDD

Lexical diversity: HD-D (vocd-d)
danielson.bryan

Readability: Danielson-Bryan
FOG

Readability: Gunning FOG Index
DRP

Readability: Degrees of Reading Power (DRP)
R.ld

Lexical diversity: Guiraud's R
get.kRp.env

Get koRpus session environment
correct.tag

Methods to correct koRpus objects
U.ld

Lexical diversity: Uber Index (U)
flesch

Readability: Flesch Readability Ease
kRp.text.analysis

Analyze texts using TreeTagger and word frequencies
textFeatures

Extract text features for authorship analysis
farr.jenkins.paterson

Readability: Farr-Jenkins-Paterson Index
K.ld

Lexical diversity: Yule's K
readability

Measure readability
jumbleWords

Produce jumbled words
read.corp.custom

Import custom corpus data
TRI

Readability: Kuntzsch's Text-Redundanz-Index
dickes.steiwer

Readability: Dickes-Steiwer Handformel
MTLD

Lexical diversity: Measure of Textual Lexical Diversity (MTLD)
ARI

Readability: Automated Readability Index (ARI)
cTest

Transform text into C-Test-like format
kRp.text.transform

Letter case transformation
manage.hyph.pat

Handling hyphenation pattern objects
kRp.POS.tags

Get elaborated word tag definitions
kRp.TTR-class

S4 class kRp.TTR
kRp.analysis-class

S4 class kRp.analysis
nWS

Readability: Neue Wiener Sachtextformeln
ELF

Readability: Farr's Easy Listening Formula (ELF)
SMOG

Readability: Simple Measure of Gobbledygook (SMOG)
kRp.text.paste

Paste koRpus objects
kRp.tagged-class

S4 class kRp.tagged
read.corp.LCC

Import LCC data
kRp.hyphen-class

S4 class kRp.hyphen
harris.jacobson

Readability: Harris-Jacobson indices
lex.div.num

Calculate lexical diversity
kRp.corp.freq-class

S4 class kRp.corp.freq
dale.chall

Readability: Dale-Chall Readability Formula
kRp.txt.trans-class

S4 class kRp.txt.trans
koRpus-package

The koRpus Package
S.ld

Lexical diversity: Summer's S
CTTR

Lexical diversity: Carroll's corrected TTR (CTTR)
fucks

Readability: Fucks' Stilcharakteristik
hyph.XX

Hyphenation patterns
RIX

Readability: Anderson's Readability Index (RIX)
flesch.kincaid

Readability: Flesch-Kincaid Grade Level
bormuth

Readability: Bormuth's Mean Cloze and Grade Placement
show

Show methods for koRpus objects
coleman.liau

Readability: Coleman-Liau Index
kRp.hyph.pat-class

S4 class kRp.hyph.pat
read.corp.celex

Import Celex data
guess.lang

Guess language a text is written in
readability.num

Calculate readability
clozeDelete

Transform text into cloze test format
linsear.write

Readability: Linsear Write Index
plot

Plot method for objects of class kRp.tagged
read.tagged

Import already tagged texts
coleman

Readability: Coleman's Formulas
TTR

Lexical diversity: Type-Token Ratio
FORCAST

Readability: FORCAST Index
freq.analysis

Analyze word frequencies
lex.div

Analyze lexical diversity
kRp.filter.wclass

Remove word classes
traenkle.bailer

Readability: Traenkle-Bailer Formeln
kRp.readability-class

S4 class kRp.readability
spache

Readability: Spache Formula
MSTTR

Lexical diversity: Mean Segmental Type-Token Ratio (MSTTR)
kRp.lang-class

S4 class kRp.lang
segment.optimizer

A function to optimize MSTTR segment sizes
strain

Readability: Strain Index
taggedText

Getter/setter methods for koRpus objects
kRp.txt.freq-class

S4 class kRp.txt.freq
summary

Summary methods for koRpus objects
query

A method to get information out of koRpus objects
read.hyph.pat

Reading patgen-compatible hyphenation pattern files
treetag

A function to call TreeTagger
MATTR

Lexical diversity: Moving-Average Type-Token Ratio (MATTR)
maas

Lexical diversity: Maas' indices
hyphen

Automatic hyphenation
tokenize

A simple tokenizer
wheeler.smith

Readability: Wheeler-Smith Score
read.BAWL

Import BAWL-R data
set.kRp.env

A function to set information on your koRpus environmenton