chr_unserialise_unicode

0th

Percentile

Translate unicode points to UTF-8

For historical reasons, R translates strings to the native encoding when they are converted to symbols. This string-to-symbol conversion is not a rare occurrence and happens for instance to the names of a list of arguments converted to a call by do.call().

If the string contains unicode characters that cannot be represented in the native encoding, R serialises those as a ASCII sequence representing the unicode point. This is why Windows users with western locales often see strings looking like <U+xxxx>. To alleviate some of the pain, rlang parses strings and looks for serialised unicode points to translate them back to the proper UTF-8 representation. This transformation occurs automatically in functions like env_names() and can be manually triggered with as_utf8_character() and chr_unserialise_unicode().

Keywords
internal
Usage
chr_unserialise_unicode(chr)
Arguments
chr

A character vector.

Life cycle

This function is experimental.

Aliases
  • chr_unserialise_unicode
Examples
library(rlang) # NOT RUN { ascii <- "<U+5E78>" chr_unserialise_unicode(ascii) identical(chr_unserialise_unicode(ascii), "\u5e78") # }
Documentation reproduced from package rlang, version 0.2.0, License: GPL-3

Community examples

Looks like there are no examples yet.