Learn R Programming

stringx (version 0.2.9)

strcoll: Compare Strings

Description

These functions provide means to compare strings in any locale using the Unicode collation algorithm.

Usage

strcoll(
  e1,
  e2,
  locale = NULL,
  strength = 3L,
  alternate_shifted = FALSE,
  french = FALSE,
  uppercase_first = NA,
  case_level = FALSE,
  normalisation = FALSE,
  numeric = FALSE
)

e1 %x<% e2<="" p="">

e1 %x<=% e2<="" p="">

e1 %x==% e2

e1 %x!=% e2

e1 %x>% e2

e1 %x>=% e2

Value

strcoll returns an integer vector representing the comparison results: if a string in e1 is smaller than the corresponding string in e2, the corresponding result will be equal to -1, and 0 if they are canonically equivalent, as well as 1 if the former is greater than the latter.

The binary operators call strcoll with default arguments and return logical vectors.

Arguments

e1, e2

character vector whose corresponding elements are to be compared

locale

NULL or "" for the default locale (see stri_locale_get) or a single string with a locale identifier, see stri_locale_list

strength

see stri_opts_collator

alternate_shifted

see stri_opts_collator

french

see stri_opts_collator

uppercase_first

see stri_opts_collator

case_level

see stri_opts_collator

normalisation

see stri_opts_collator

numeric

see stri_opts_collator

Differences from Base R

Replacements for base Comparison operators implemented with stri_cmp.

  • collation in different locales is difficult and non-portable across platforms [fixed here -- using services provided by ICU]

  • overloading `<.character` has no effect in R, because S3 method dispatch is done internally with hard-coded support for character arguments. We could have replaced the generic `<` with the one that calls UseMethod, but it feels like a too intrusive solution [fixed by introducing the `%x<%` operator]

Details

These functions are fully vectorised with respect to both arguments.

For a locale-insensitive behaviour like that of strcmp from the standard C library, call strcoll(e1, e2, locale="C", strength=4L, normalisation=FALSE). However, some normalisation will still be performed.

See Also

The official online manual of stringx at https://stringx.gagolewski.com/

Related function(s): xtfrm

Examples

Run this code
# lexicographic vs. numeric sort
strcoll("100", c("1", "10", "11", "99", "100", "101", "1000"))
strcoll("100", c("1", "10", "11", "99", "100", "101", "1000"), numeric=TRUE)
strcoll("hladn\u00FD", "chladn\u00FD", locale="sk_SK")

Run the code above in your browser using DataLab