[THIS IS AN EXPERIMENTAL FUNCTION]
stri_enc_detect2(str, locale = NULL)
raw
vectorsNULL
or ""
for default locale,
NA
for just checking the UTF-* family,
or a single string with locale identifier.stri_enc_detect
,
this function returns a list of length equal to the length of str
.
Each list element is a list with the following three named components:
Encoding
-- string; guessed encodings;NA
on failure
(iffencodings
is empty),Language
-- alwaysNA
,Confidence
-- numeric in [0,1]; the higher the value,
the more confidence there is in the match;NA
on failure.str
.First, the text is checked whether it is valid
UTF-32BE, UTF-32LE, UTF-16BE, UTF-16LE, UTF-8
(as in stri_enc_detect
,
this slightly bases on i18n/csrucode.cpp
,
but we do it in our own way, however) or ASCII.
If locale
is not NA
and the above fails,
the text is checked for the number of occurrences
of language-specific code points (data provided by the
The guess is of course imprecise [This is DRAFT API - still does not work as expected], as it is obtained using statistics. Because of this, detection works best if you supply at least a few hundred bytes of character data that's in a single language.
If you have no initial guess on language and encoding, try with
stri_enc_detect
(uses stri_enc_detect2
works better than the
stri_enc_detect
;
stri_enc_isascii
;
stri_enc_isutf16be
,
stri_enc_isutf16le
,
stri_enc_isutf32be
,
stri_enc_isutf32le
;
stri_enc_isutf8
;
stringi-encoding
Other locale_sensitive: %s!==%
,
%s!=%
, %s<=%< a="">=%<>
,
%s<%< a="">%<>
, %s===%
,
%s==%
, %s>=%
,
%s>%
, %stri!==%
,
%stri!=%
, %stri<=%< a="">=%<>
,
%stri<%< a="">%<>
, %stri===%
,
%stri==%
, %stri>=%
,
%stri>%
; stri_cmp
,
stri_cmp_eq
, stri_cmp_equiv
,
stri_cmp_ge
, stri_cmp_gt
,
stri_cmp_le
, stri_cmp_lt
,
stri_cmp_neq
,
stri_cmp_nequiv
,
stri_compare
;
stri_count_boundaries
,
stri_count_words
;
stri_duplicated
,
stri_duplicated_any
;
stri_extract_words
;
stri_locate_boundaries
,
stri_locate_words
;
stri_opts_collator
;
stri_order
, stri_sort
;
stri_split_boundaries
;
stri_trans_tolower
,
stri_trans_totitle
,
stri_trans_toupper
;
stri_unique
; stri_wrap
;
stringi-locale
;
stringi-search-boundaries
;
stringi-search-coll