icuSetCollate(...)
icuGetCollate(type = c("actual", "valid"))icuGetCollate, a character string describing the ICU locale
in use (which may be reported as "ICU not in use"). The
actual locale may be simpler than the requested locale: for
example "da" rather than "da_DK": English locales are
likely to report "root".
icuSetCollate can be used to tune the way collation is done.
On other builds calling this function does nothing, with a warning.Possible arguments are
locale:"da_DK"
giving the language and country whose collation rules are to be
used. If present, this should be the first argument.
case_first:"upper", "lower" or
"default", asking for upper- or lower-case characters to be
sorted first. The default is usually lower-case first, but not in
all languages (not under the default settings for Danish, for example).
alternate_handling:"non_ignorable" (primary strength) and
"shifted" (quaternary strength).
strength:"primary", "secondary", "tertiary"
(default), "quaternary" and "identical".
french_collation:"on", "off"
and "default".
normalization:"on" and "off" (default). This affects the
collation of composite characters.
case_level:"on" and "off" (default).
hiragana_quaternary:"on" (sort
Hiragana first at quaternary level) and "off".
Only the first three are likely to be of interest except to those with a detailed understanding of collation and specialized requirements.
Some special values are accepted for locale:
"none":
"ASCII":strcmp is used instead, which should sort byte-by-byte in
(unsigned) numerical order. (As from R 3.1.3.)
"default":
"", "root":
For the specifications of real ICU locales, see
http://userguide.icu-project.org/locale. Note that ICU does not
report that a locale is not supported, but falls back to its idea of
best fit (which could be rather different and is reported by
icuGetCollate("actual"), often "root"). Most English
locales fall back to "root" as although e.g.\ifelse{latex}{\out{~}}{ } "en_GB" is
a valid locale (at least on some platforms), it contains no special
rules for collation. Note that "C" is not a supported ICU locale.
Some examples are case_level = "on", strength = "primary" to ignore
accent differences and alternate_handling = "shifted" to ignore
space and punctuation characters.
Initially ICU will not be used for collation if the OS is set to use
the C locale for collation. Once this function is called with
a value for locale, ICU will be used until it is called again
with locale = "none".
All customizations are reset to the default for the locale if
locale is specified: the collation engine is reset if the
OS collation locate category is changed by Sys.setlocale.
sort. capabilities for whether ICU is available;
extSoftVersion for its version.
The ICU user guide chapter on collation (http://userguide.icu-project.org/collation).