Learn R Programming

⚠️There's a newer version (0.13-8) of this package.Take me there.

koRpus (version 0.06-4)

An R Package for Text Analysis

Description

A set of tools to analyze texts. Includes, amongst others, functions for automatic language detection, hyphenation, several indices of lexical diversity (e.g., type token ratio, HD-D/vocd-D, MTLD) and readability (e.g., Flesch, SMOG, LIX, Dale-Chall). Basic import functions for language corpora are also provided, to enable frequency analyses (supports Celex and Leipzig Corpora Collection file formats) and measures like tf-idf. Support for additional languages can be added on-the-fly or by plugin packages. Note: For full functionality a local installation of TreeTagger is recommended. 'koRpus' also includes a plugin for the R GUI and IDE RKWard, providing graphical dialogs for its basic features. The respective R package 'rkward' cannot be installed directly from a repository, as it is a part of RKWard. To make full use of this feature, please install RKWard from https://rkward.kde.org (plugins are detected automatically). Due to some restrictions on CRAN, the full package sources are only available from the project homepage. To ask for help, report bugs, suggest feature improvements, or discuss the global development of the package, please subscribe to the koRpus-dev mailing list (https://ml06.ispgateway.de/mailman/ listinfo/korpus-dev_r.reaktanz.de).

Copy Link

Version

Install

install.packages('koRpus')

Monthly Downloads

3,208

Version

0.06-4

License

GPL (>= 3)

Maintainer

Meik Michalke

Last Published

March 8th, 2016

Functions in koRpus (0.06-4)

ARI

Readability: Automated Readability Index (ARI)
MSTTR

Lexical diversity: Mean Segmental Type-Token Ratio (MSTTR)
guess.lang

Guess language a text is written in
cTest

Transform text into C-Test-like format
linsear.write

Readability: Linsear Write Index
kRp.TTR,-class

S4 Class kRp.TTR
kRp.POS.tags

Get elaborated word tag definitions
TTR

Lexical diversity: Type-Token Ratio
SMOG

Readability: Simple Measure of Gobbledygook (SMOG)
danielson.bryan

Readability: Danielson-Bryan
maas

Lexical diversity: Maas' indices
MATTR

Lexical diversity: Moving-Average Type-Token Ratio (MATTR)
MTLD

Lexical diversity: Measure of Textual Lexical Diversity (MTLD)
DRP

Readability: Degrees of Reading Power (DRP)
LIX

Readability: Bj"ornsson's L"asbarhetsindex (LIX)
R.ld

Lexical diversity: Guiraud's R
coleman.liau

Readability: Coleman-Liau Index
kRp.cluster

Work in (early) progress. Probably don't even look at it. Consider it pure magic that is not to be tempered with.
wheeler.smith

Readability: Wheeler-Smith Score
coleman

Readability: Coleman's Formulas
bormuth

Readability: Bormuth's Mean Cloze and Grade Placement
U.ld

Lexical diversity: Uber Index (U)
FOG

Readability: Gunning FOG Index
kRp.tagged,-class

S4 Class kRp.tagged
kRp.hyph.pat,-class

S4 Class kRp.hyph.pat
read.corp.custom

Import custom corpus data
manage.hyph.pat

Handling hyphenation pattern objects
kRp.txt.freq,-class

S4 Class kRp.txt.freq
S.ld

Lexical diversity: Summer's S
plot

Plot method for objects of class kRp.tagged
kRp.filter.wclass

Remove word classes
FORCAST

Readability: FORCAST Index
CTTR

Lexical diversity: Carroll's corrected TTR (CTTR)
kRp.corp.freq,-class

S4 Class kRp.corp.freq
farr.jenkins.paterson

Readability: Farr-Jenkins-Paterson Index
kRp.txt.trans,-class

S4 Class kRp.txt.trans
segment.optimizer

A function to optimize MSTTR segment sizes
lex.div.num

Calculate lexical diversity
query

A method to get information out of koRpus objects
kRp.text.transform

Letter case transformation
read.tagged

Import already tagged texts
set.lang.support

Add support for new languages
readability

Measure readability
kRp.readability,-class

S4 Class kRp.readability
HDD

Lexical diversity: HD-D (vocd-d)
ELF

Readability: Farr's Easy Listening Formula (ELF)
tokenize

A simple tokenizer
summary,kRp.lang-method

Summary methods for koRpus objects
tuldava

Readability: Tuldava's Text Difficulty Formula
flesch

Readability: Flesch Readability Ease
traenkle.bailer

Readability: Traenkle-Bailer Formeln
show

Show methods for koRpus objects
dale.chall

Readability: Dale-Chall Readability Formula
read.BAWL

Import BAWL-R data
strain

Readability: Strain Index
kRp.lang,-class

S4 Class kRp.lang
fucks

Readability: Fucks' Stilcharakteristik
lex.div

Analyze lexical diversity
readability.num

Calculate readability
C.ld

Lexical diversity: Herdan's C
koRpus-package

The koRpus Package
treetag

A function to call TreeTagger
read.corp.LCC

Import LCC data
nWS

Readability: Neue Wiener Sachtextformeln
clozeDelete

Transform text into cloze test format
jumbleWords

Produce jumbled words
freq.analysis

Analyze word frequencies
textFeatures

Extract text features for authorship analysis
kRp.hyphen,-class

S4 Class kRp.hyphen
taggedText

Getter/setter methods for koRpus objects
hyphen

Automatic hyphenation
dickes.steiwer

Readability: Dickes-Steiwer Handformel
get.kRp.env

Get koRpus session environment
TRI

Readability: Kuntzsch's Text-Redundanz-Index
read.hyph.pat

Reading patgen-compatible hyphenation pattern files
spache

Readability: Spache Formula
kRp.analysis,-class

S4 Class kRp.analysis
correct.tag

Methods to correct koRpus objects
kRp.text.paste

Paste koRpus objects
K.ld

Lexical diversity: Yule's K
RIX

Readability: Anderson's Readability Index (RIX)
flesch.kincaid

Readability: Flesch-Kincaid Grade Level
harris.jacobson

Readability: Harris-Jacobson indices
hyph.XX

Hyphenation patterns
kRp.text.analysis

Analyze texts using TreeTagger and word frequencies
set.kRp.env

A function to set information on your koRpus environmenton
read.corp.celex

Import Celex data