Learn R Programming

Nippon (version 0.7.1)

sanitizeZenkaku: Sanitizing strings contaminated with fullwidth (zenkaku) charactors.

Description

Sanitizing strings unintensionally contaminated with fullwidth (zenkaku) charactors by converting characters from fullwidth (zenkaku) to halfwidth (hankaku) forms.

Usage

sanitizeZenkaku(s)

Arguments

s

A character vector. UTF-8 encoding is preferable.

Value

A character vector. All alphabets, numbers, and symbols have their halfwidth from.

Details

Occasionally a character vector is unintentionally contaminated with fullwidth (zenkaku) characters. sanitizeZenkaku remove Japanese fullwidth (zenkaku) alphabets, numbers, and symbols from the given character vector in order to make logical and factor vectors work properly. The alphabets, numbers, and symbols are substitute for halfwidth forms (aka. ASCII), while a fullwidth space is simply removed.

See Also

zen2han

Examples

Run this code
# NOT RUN {
(n <- intToUtf8(c(65296 + 1:3, 12288)))
sanitizeZenkaku(n)
# }

Run the code above in your browser using DataLab