This function is to convert Japanese characters between fullwidth (zenkaku) and halfwidth (hankaku) forms for avoiding trouble in Japanese string operation or for taking advantage of fullwidth (zenkaku) forms.
zen2han(s)
han2zen(s)
A character vector. UTF-8 encoding is preferable.
zen2han
returns a character vector. All alphabets, numbers, and
symbols have their halfwidth from.
han2zen
returns a character vector. All alphabets, numbers, and
symbols have their fullwidth from.
Japanese graphic characters are traditionally classed into fullwidth (zenkaku) and halfwidth (hankaku) form. Alphabets, numbers, and symbols can take either from, while Hiragana, Katakana, and Kanji are only available as fullwidth characters. It causes troubles in string manipulation such as matching or searching where the two forms of alphabets, numbers, and symbols are mixed in. Thus, the character data should be sanitized with this function.
The targeted zenkaku characters are numbers, alphabets, punctuation
marks, and other special symbols. Katakana is not the target of
zen2han
because the halfwidth Katakana is rather a troublemaker.
han2zen
functions reversely. This is useful for Japanese
users to escape prohibitive characters in strings (e.g., '$' in
a character vector).
Halfwidth and Fullwidth Forms http://www.alanwood.net/unicode/halfwidth_and_fullwidth_forms.html
han2zen
, showNonASCII
# NOT RUN {
zenkaku
zen2han(as.character(zenkaku))
# }
Run the code above in your browser using DataLab