Learn R Programming

stringi (version 0.1-25)

stri_enc_toutf8: Convert To UTF-8

Description

Converts character strings with (possibly) internally marked encodings to UTF-8 strings.

Usage

stri_enc_toutf8(str, is_unknown_8bit = FALSE)

Arguments

str
character vector to be converted
is_unknown_8bit
single logical value, see Details

Value

  • Returns a character vector.

Details

If is_unknown_8bit is set to TRUE and a string is marked (internally) as being neither ASCII nor UTF-8-encoded, then all bytecodes > 127 are replaced with the Unicode REPLACEMENT CHARACTER (\Ufffd). Bytes-marked strings are treated as 8-bit strings.

Otherwise, R encoding marks is assumed to be trustworthy (ASCII, UTF-8, Latin1, or Native). Bytes encoding fail here.

Note that the REPLACEMENT CHARACTER may be interpreted as Unicode NA value for single characters.

See Also

Other encoding_conversion: stri_conv, stri_encode; stri_enc_fromutf32; stri_enc_toascii; stri_enc_toutf32; stringi-encoding