stringi (version 1.8.3)

stri_width: Determine the Width of Code Points

Description

Approximates the number of text columns the `cat()` function might use to print a string using a mono-spaced font.

Usage

stri_width(str)

Value

Returns an integer vector of the same length as str.

Arguments

str

character vector or an object coercible to

Author

Marek Gagolewski and other contributors

Details

The Unicode standard does not formalize the notion of a character width. Roughly based on http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c, https://github.com/nodejs/node/blob/master/src/node_i18n.cc, and UAX #11 we proceed as follows. The following code points are of width 0:

  • code points with general category (see stringi-search-charclass) Me, Mn, and Cf),

  • C0 and C1 control codes (general category Cc) - for compatibility with the nchar function,

  • Hangul Jamo medial vowels and final consonants (code points with enumerable property UCHAR_HANGUL_SYLLABLE_TYPE equal to U_HST_VOWEL_JAMO or U_HST_TRAILING_JAMO; note that applying the NFC normalization with stri_trans_nfc is encouraged),

  • ZERO WIDTH SPACE (U+200B),

Characters with the UCHAR_EAST_ASIAN_WIDTH enumerable property equal to U_EA_FULLWIDTH or U_EA_WIDE are of width 2.

Most emojis and characters with general category So (other symbols) are of width 2.

SOFT HYPHEN (U+00AD) (for compatibility with nchar) as well as any other characters have width 1.

References

East Asian Width -- Unicode Standard Annex #11, https://www.unicode.org/reports/tr11/

See Also

The official online manual of stringi at https://stringi.gagolewski.com/

Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1-59, tools:::Rd_expr_doi("10.18637/jss.v103.i02")

Other length: %s$%(), stri_isempty(), stri_length(), stri_numbytes(), stri_pad_both(), stri_sprintf()

Examples

Run this code
stri_width(LETTERS[1:5])
stri_width(stri_trans_nfkd('\u0105'))
stri_width(stri_trans_nfkd('\U0001F606'))
stri_width( # Full-width equivalents of ASCII characters:
   stri_enc_fromutf32(as.list(c(0x3000, 0xFF01:0xFF5E)))
)
stri_width(stri_trans_nfkd('\ubc1f')) # includes Hangul Jamo medial vowels and final consonants

Run the code above in your browser using DataLab