Learn R Programming

⚠️There's a newer version (0.13-8) of this package.Take me there.

koRpus (version 0.05-6)

An R Package for Text Analysis

Description

A set of tools to analyze texts. Includes, amongst others, functions for automatic language detection, hyphenation, several indices of lexical diversity (e.g., type token ratio, HD-D/vocd-D, MTLD) and readability (e.g., Flesch, SMOG, LIX, Dale-Chall). Basic import functions for language corpora are also provided, to enable frequency analyses (supports Celex and Leipzig Corpora Collection file formats) and measures like tf-idf. Note: For full functionality a local installation of TreeTagger is recommended. 'koRpus' also includes a plugin for the R GUI and IDE RKWard, providing graphical dialogs for its basic features. The respective R package 'rkward' cannot be installed directly from a repository, as it is a part of RKWard. To make full use of this feature, please install RKWard from https://rkward.kde.org (plugins are detected automatically). Due to some restrictions on CRAN, the full package sources are only available from the project homepage.

Copy Link

Version

Install

install.packages('koRpus')

Monthly Downloads

4,316

Version

0.05-6

License

GPL (>= 3)

Maintainer

Meik Michalke

Last Published

June 30th, 2015

Functions in koRpus (0.05-6)

cTest

Transform text into C-Test-like format
CTTR

Lexical diversity: Carroll's corrected TTR (CTTR)
ARI

Readability: Automated Readability Index (ARI)
kRp.readability,-class

S4 Class kRp.readability
read.corp.celex

Import Celex data
dickes.steiwer

Readability: Dickes-Steiwer Handformel
kRp.hyphen,-class

S4 Class kRp.hyphen
HDD

Lexical diversity: HD-D (vocd-d)
koRpus-package

The koRpus Package
ELF

Readability: Farr's Easy Listening Formula (ELF)
kRp.text.paste

Paste koRpus objects
MATTR

Lexical diversity: Moving-Average Type-Token Ratio (MATTR)
clozeDelete

Transform text into cloze test format
DRP

Readability: Degrees of Reading Power (DRP)
TRI

Readability: Kuntzsch's Text-Redundanz-Index
fucks

Readability: Fucks' Stilcharakteristik
FORCAST

Readability: FORCAST Index
bormuth

Readability: Bormuth's Mean Cloze and Grade Placement
kRp.filter.wclass

Remove word classes
RIX

Readability: Anderson's Readability Index (RIX)
flesch

Readability: Flesch Readability Ease
danielson.bryan

Readability: Danielson-Bryan
kRp.POS.tags

Get elaborated word tag definitions
read.BAWL

Import BAWL-R data
taggedText

Getter/setter methods for koRpus objects
guess.lang

Guess language a text is written in
farr.jenkins.paterson

Readability: Farr-Jenkins-Paterson Index
coleman.liau

Readability: Coleman-Liau Index
S.ld

Lexical diversity: Summer's S
coleman

Readability: Coleman's Formulas
kRp.lang,-class

S4 Class kRp.lang
kRp.corp.freq,-class

S4 Class kRp.corp.freq
kRp.txt.trans,-class

S4 Class kRp.txt.trans
kRp.txt.freq,-class

S4 Class kRp.txt.freq
jumbleWords

Produce jumbled words
lex.div.num

Calculate lexical diversity
correct.tag

Methods to correct koRpus objects
read.corp.LCC

Import LCC data
get.kRp.env

Get koRpus session environment
MSTTR

Lexical diversity: Mean Segmental Type-Token Ratio (MSTTR)
readability

Measure readability
kRp.cluster

Work in (early) progress. Probably don't even look at it. Consider it pure magic that is not to be tempered with.
tokenize

A simple tokenizer
spache

Readability: Spache Formula
harris.jacobson

Readability: Harris-Jacobson indices
kRp.analysis,-class

S4 Class kRp.analysis
read.tagged

Import already tagged texts
plot

Plot method for objects of class kRp.tagged
set.kRp.env

A function to set information on your koRpus environmenton
freq.analysis

Analyze word frequencies
manage.hyph.pat

Handling hyphenation pattern objects
hyphen

Automatic hyphenation
strain

Readability: Strain Index
textFeatures

Extract text features for authorship analysis
tuldava

Readability: Tuldava's Text Difficulty Formula
summary,kRp.TTR-method

Summary methods for koRpus objects
kRp.text.transform

Letter case transformation
C.ld

Lexical diversity: Herdan's C
maas

Lexical diversity: Maas' indices
LIX

Readability: Bj"ornsson's L"asbarhetsindex (LIX)
kRp.hyph.pat,-class

S4 Class kRp.hyph.pat
dale.chall

Readability: Dale-Chall Readability Formula
linsear.write

Readability: Linsear Write Index
nWS

Readability: Neue Wiener Sachtextformeln
hyph.XX

Hyphenation patterns
kRp.text.analysis

Analyze texts using TreeTagger and word frequencies
read.corp.custom

Import custom corpus data
read.hyph.pat

Reading patgen-compatible hyphenation pattern files
lex.div

Analyze lexical diversity
query

A method to get information out of koRpus objects
readability.num

Calculate readability
U.ld

Lexical diversity: Uber Index (U)
FOG

Readability: Gunning FOG Index
R.ld

Lexical diversity: Guiraud's R
TTR

Lexical diversity: Type-Token Ratio
flesch.kincaid

Readability: Flesch-Kincaid Grade Level
SMOG

Readability: Simple Measure of Gobbledygook (SMOG)
kRp.TTR,-class

S4 Class kRp.TTR
MTLD

Lexical diversity: Measure of Textual Lexical Diversity (MTLD)
kRp.tagged,-class

S4 Class kRp.tagged
K.ld

Lexical diversity: Yule's K
show,kRp.TTR-method

Show methods for koRpus objects
traenkle.bailer

Readability: Traenkle-Bailer Formeln
segment.optimizer

A function to optimize MSTTR segment sizes
wheeler.smith

Readability: Wheeler-Smith Score
treetag

A function to call TreeTagger