utf8Conversion

0th

Percentile

Convert Integer Vectors to or from UTF-8-encoded Character Vectors

Conversion of UTF-8 encoded character vectors to and from integer vectors.

Keywords
utilities, character
Usage
utf8ToInt(x) intToUtf8(x, multiple = FALSE)
Arguments
x
object to be converted.
multiple
logical: should the conversion be to a single character string or multiple individual characters?
Details

These will work in any locale, including on platforms that do not otherwise support multi-byte character sets.

Value

Unicode defines a name and a number of all of the glyphs it encompasses: the numbers are called code points: they run from 0 to 0x10FFFF.utf8ToInt converts a length-one character string encoded in UTF-8 to an integer vector of Unicode code points. As from R 3.2.1 it checks validity of the input and returns NA if it is invalid.intToUtf8 converts a numeric vector of Unicode code points either to a single character string or a character vector of single characters. (For a single character string 0 is silently omitted: otherwise 0 is mapped to "". Non-integral numeric values are truncated to integers.) The Encoding is declared as "UTF-8".NA inputs are mapped to NA output.

Aliases
  • utf8ToInt
  • intToUtf8
  • Unicode
  • code point
Examples
library(base) utf8ToInt("bi\u00dfchen") utf8ToInt("\xfa\xb4\xbf\xbf\x9f")
Documentation reproduced from package base, version 3.3.0, License: Part of R 3.3.0

Community examples

Looks like there are no examples yet.