stri_enc_toutf8(str, is_unknown_8bit = FALSE)
is_unknown_8bit
is set to TRUE
and a
string is marked (internally) as being neither ASCII nor
UTF-8-encoded, then all bytecodes > 127 are replaced with
the Unicode REPLACEMENT CHARACTER (\Ufffd). Bytes-marked
strings are treated as 8-bit strings.Otherwise, R encoding marks is assumed to be trustworthy (ASCII, UTF-8, Latin1, or Native). Bytes encoding fail here.
Note that the REPLACEMENT CHARACTER may be interpreted as
Unicode NA
value for single characters.
stri_conv
,
stri_encode
;
stri_enc_fromutf32
;
stri_enc_toascii
;
stri_enc_toutf32
;
stringi-encoding