stri_encode(str, from = NULL, to = NULL, to_raw = FALSE)stri_conv(str, from = NULL, to = NULL, to_raw = FALSE)
raw
vectors to be convertedNULL
or ""
for
default encoding or internal encoding marks usage (see
Details); otherwise, a single string with encoding name,
see stri_enc_list
NULL
or ""
for
default encoding (see stri_enc_get
), or a
single string with encoding nameto_raw
is FALSE
, then a character vector
with encoded strings (and sensible encoding marks) is
returned. Otherwise, you get a list of raw vectors.stri_conv
is an alias for
stri_encode
.Please, refer to stri_enc_list
for the list
of supported encodings and stringi-encoding for
general discussion.
If from
is either missing, ""
, or NULL
and str
is an atomic vector, then the input strings'
encoding marks are used (just like in almost all
stri_enc_get
. Otherwise, the internal
encoding marks are overridden by the given encoding. On the
other hand, for str
being a list of raw vectors, we
assume that the input encoding is the current default
encoding.
For to_raw=FALSE
, the output strings always have
marked encodings according to the target converter used (as
specified by to
) and the current default Encoding
(ASCII
, latin1
, UTF-8
, native
,
or bytes
in all other cases).
Note that possible problems may occur when to
is set
to e.g. UTF-16 and UTF-32, as the output strings may have
embedded NULs. In such cases use to_raw=TRUE
and
consider specifying a byte order marker (BOM) for
portability reasons (e.g. set UTF-16
or
UTF-32
which automatically adds BOMs).
Note that stri_encode(as.raw(data),
"8bitencodingname")
is a wise substitute for
rawToChar
.
Currently, if an incorrect code point is found on input, it is replaced by the default (for that target encoding) substitute character and a warning is generated.
Converters -- ICU User Guide,
stri_enc_fromutf32
;
stri_enc_toascii
;
stri_enc_toutf32
;
stri_enc_toutf8
;
stringi-encoding