Learn R Programming

Nippon (version 0.7.1)

kakasi: Interface to kakasi

Description

The kakasi is an interface to the external program kakasi, KAnji KAna Simple Inverter. It is useful especially when Japanese Kanji characters are subject to convert to Romaji (ASCII) characters.

Usage

kakasi(x, kakasi.option="-Ha -Ka -Ja -Ea -ka",
 ITAIJIDICTPATH = Sys.getenv("ITAIJIDICTPATH", unset = NA),
 KANWADICTPATH = Sys.getenv("KANWADICTPATH", unset = NA))

Arguments

x

A character vector

kakasi.option

A chracter string specifying the options passed to kakasi library/program

ITAIJIDICTPATH

A character string specifying the path to itaijidict. Environmental variable of itaijidict passed to kakasi library.

KANWADICTPATH

A character string specifying the path to kanwadict. Environmental variable of kanwadict passed to kakasi library.

Value

A character vector

Warning

Note that non-Japanese and non-ASCII characters are not filtered in kakasi.kakasi warns unless LC_CTYPE is "ja_JP.UTF-8" (Linux or MacOSX) or "Japanese_Japan.932" (Windows). It is not sure whether the function is workable in other locale.

Details

Japanese strings are often made up a mixture of Chinese characters (Kanji), Kana (Hiragana and Katakana) and Romaji (Latin phonetical pronunciation). The external program kakasi converts between these four different ways of writing Japanese. kakasi and Sys.kakasi are useful especially for sanitizing a character vector by converting Japanese (non-ASCII) to ASCII characters.

kakasi uses two basic dictionaries: itaijidict and kanwadict. These dictionaries are included in doc/share of Package directory after installation of Nippon package. Since the kakasi library looks up the environmental variables to find dictionary, ITAIJIDICTPATH and KANWADICTPATH are internally set using Sys.setenv at the time when kakasi is called first time. After the first call, kakasi continues to use the environmental variables. Until R session closes, these environmental variables never unset. To use alternative dictionary instead of the bundled, a user can set the environmental variables using Sys.setenv or as arguments of kakasi. For permanent setting of environmental variables, see help of Renviron.

References

KAKASI - Kanji Kana Simple Inverter http://kakasi.namazu.org/

Examples

Run this code
# NOT RUN {
library(Nippon)
data(prefectures)
regions <- unique(prefectures$region)
regions
# Unix-like operating systems
kakasi(regions)
# Windows
regions.cp932 <- iconv(regions, from = "UTF-8", to = "CP932")
kakasi(regions.cp932)
# }

Run the code above in your browser using DataLab