hyphen: Automatic hyphenation

Description

These methods implement word hyphenation, based on Liang's algorithm.

Usage

hyphen(words, ...)
## S3 method for class 'kRp.taggedText':
hyphen(words, hyph.pattern = NULL,
  min.length = 4, rm.hyph = TRUE, corp.rm.class = "nonpunct",
  corp.rm.tag = c(), quiet = FALSE, cache = TRUE)
## S3 method for class 'character':
hyphen(words, hyph.pattern = NULL, min.length = 4,
  rm.hyph = TRUE, corp.rm.class = "nonpunct", corp.rm.tag = c(),
  quiet = FALSE, cache = TRUE)

Arguments

words

Either an object of class kRp.tagged-class, kRp.txt.freq-class or

...

Only used for the method generic.

hyph.pattern

Either an object of class kRp.hyph.pat-class, or a valid character string naming the language of the patterns to be used. See details.

min.length

Integer, number of letters a word must have for considering a hyphenation. hyphen will not split words after the first or before the last letter, so values smaller than 4 are not useful.

rm.hyph

Logical, whether appearing hyphens in words should be removed before pattern matching.

corp.rm.class

A character vector with word classes which should be ignored. The default value "nonpunct" has special meaning and will cause the result of

kRp.POS.tags(lang, c("punct","sentc"),
      list.classes=TRUE)

to be used. Relevant only

corp.rm.tag

A character vector with POS tags which should be ignored. Relevant only if words is a valid koRpus object.

quiet

Logical. If FALSE, short status messages will be shown.

cache

Logical. hyphen() can cache results to speed up the process. If this option is set to TRUE, the current cache will be queried and new tokens also be added. Caches are language-specific and reside in an environment, i.e., th

Value

An object of class kRp.hyphen-class

code

hyph.XX

Details

For this to work the function must be told which pattern set it should use to find the right hyphenation spots. If words is already a tagged object, its language definition might be used. Otherwise, in addition to the words to be processed you must specify hyph.pattern. You have two options: If you want to use one of the built-in language patterns, just set it to the according language abbrevation. As of this version valid choices are:

"de"

{--- German (new spelling, since 1996)} "de.old" {--- German (old spelling, 1901--1996)} "en" {--- English (UK)} "en.us" {--- English (US)} "es" {--- Spanish} "fr" {--- French} "it" {--- Italian} "ru" {--- Russian}

References

Liang, F.M. (1983). Word Hy-phen-a-tion by Com-put-er. Dissertation, Stanford University, Dept. of Computer Science.

[1] http://tug.ctan.org/tex-archive/language/hyph-utf8/tex/generic/hyph-utf8/patterns/

[2] http://www.ctan.org/tex-archive/macros/latex/base/lppl.txt

Examples

Run this code

hyphen(tagged.text)

Run the code above in your browser using DataLab