Japanese strings are often made up a mixture of Chinese characters
(Kanji), Kana (Hiragana and Katakana) and Romaji (Latin phonetical
pronunciation). The external program kakasi converts between these four
different ways of writing Japanese. kakasi
and Sys.kakasi
are useful especially for sanitizing a character vector by converting
Japanese (non-ASCII) to ASCII characters.
kakasi
uses two basic dictionaries: itaijidict and
kanwadict. These dictionaries are included in doc/share of Package
directory after installation of Nippon package. Since the kakasi library
looks up the environmental variables to find dictionary, ITAIJIDICTPATH
and KANWADICTPATH are internally set using Sys.setenv
at the time
when kakasi
is called first time. After the first call,
kakasi
continues to use the environmental variables. Until R
session closes, these environmental variables never unset. To use
alternative dictionary instead of the bundled, a user can set the
environmental variables using Sys.setenv
or as arguments of
kakasi
. For permanent setting of environmental variables, see
help of Renviron.