Optionally, R can be built to collate character strings by ICU
(http://site.icu-project.org). For such systems,
icuSetCollate can be used to tune the way collation is done.
On other builds calling this function does nothing, with a warning.
Possible arguments are
A character string such as
giving the language and country whose collation rules are to be
used. If present, this should be the first argument.
"default", asking for upper- or lower-case characters to be
sorted first. The default is usually lower-case first, but not in
all languages (not under the default settings for Danish, for example).
Controls the handling of
‘variable’ characters (mainly punctuation and symbols).
Possible values are
"non_ignorable" (primary strength) and
"shifted" (quaternary strength).
Which components should be used? Possible
In a French locale the way accents
affect collation is from right to left, whereas in most other locales
it is from left to right. Possible values
Should strings be normalized? Possible values
"off" (default). This affects the
collation of composite characters.
An additional level between secondary and
tertiary, used to distinguish large and small Japanese Kana
characters. Possible values
Hiragana first at quaternary level) and
Only the first three are likely to be of interest except to those with a
detailed understanding of collation and specialized requirements.
Some special values are accepted for
ICU is not used for collation: the OS's
collation services are used instead.
ICU is not used for collation: the C function
strcmp is used instead, which should sort byte-by-byte in
(unsigned) numerical order.
obtains the locale from the OS as is done at the start of the
session. If environment variable
R_ICU_LOCALE is set to a
non-empty value, its value is used rather than consulting the OS.
the ‘root’ collation: see
For the specifications of ‘real’ ICU locales, see
http://userguide.icu-project.org/locale. Note that ICU does not
report that a locale is not supported, but falls back to its idea of
‘best fit’ (which could be rather different and is reported by
"root"). Most English
locales fall back to
"root" as although e.g.
a valid locale (at least on some platforms), it contains no special
rules for collation. Note that
"C" is not a supported ICU locale.
Some examples are
case_level = "on", strength = "primary" to ignore
accent differences and
alternate_handling = "shifted" to ignore
space and punctuation characters.
Initially ICU will not be used for collation if the OS is set to use
C locale for collation. Once this function is called with
a value for
locale, ICU will be used until it is called again
locale = "none".
All customizations are reset to the default for the locale if
locale is specified: the collation engine is reset if the
OS collation locate category is changed by