hunspell_check: Hunspell Spell Checking

Description

The hunspell_check function takes a vector of words and checks each individual word for correctness. The hunspell_find function takes a character vector with text (in plain, latex or man format) and returns a list with incorrect words for each line. Finally hunspell_suggest is used to suggest correct alternatives for each (incorrect) input word.

Usage

hunspell_check(words, ignore = character(), lang = "en_US")
hunspell_find(text, ignore = character(), format = c("text", "man",
  "latex"), lang = "en_US")
hunspell_suggest(words, lang = "en_US")
hunspell_analyze(words, lang = "en_US")
hunspell_stem(words, lang = "en_US")

Arguments

words

character vector with individual words to spellcheck

ignore

character vector with additional approved words for the dictionary

lang

dictionary language; currently only en_US is supported

text

character vector with arbitrary input text

format

input format; supported parsers are text, latex or man

Details

The functions hunspell_analyze and hunspell_stem try to break down a word and return it's structure or stem word(s).

Currently only US english dictionary is included with the package. Additional dictrionaries can be downloaded from an OpenOffice http://ftp.snt.utwente.nl/pub/software/openoffice/contrib/dictionaries/{mirror} or http://archive.ubuntu.com/ubuntu/pool/main/libr/libreoffice-dictionaries/?C=S;O=D{bundle}.

Examples

Run this code

#check individual words
words <- c("beer", "wiskey", "wine")
correct <- hunspell_check(words)
print(correct)

# find suggestions for incorrect words
hunspell_suggest(words[!correct])

# find incorrect words in piece of text
bad <- hunspell_find("spell checkers are not neccessairy for langauge ninja's")
print(bad[[1]])
hunspell_suggest(bad[[1]])

# check a latex document
download.file("http://arxiv.org/e-print/1406.4806v1", "1406.4806v1.tar.gz",  mode = "wb")
untar("1406.4806v1.tar.gz")
text <- readLines("content.tex", warn = FALSE)
words <- hunspell_find(text, format = "latex")
sort(unique(unlist(words)))

Run the code above in your browser using DataLab