Learn R Programming

stringi (version 0.1-25)

stri_enc_info: Query Given Character Encoding

Description

Gets basic information on a given character encoding.

Usage

stri_enc_info(enc = NULL)

Arguments

enc
NULL or "" for default encoding, or a single string with encoding name

Value

  • Returns a list with the following components:
    • Name.friendly-- Friendly encoding name: MIME Name or JAVA Name orICUCanonical Name (selecting the first of supported ones, see below);
    • Name.ICU-- Encoding name as identified byICU;
    • Name.*-- other standardized encoding names, e.g.Name.UTR22,Name.IBM,Name.WINDOWS,Name.JAVA,Name.IANA,Name.MIME(some may be not available for selected encodings);
    • ASCII.subset-- is ASCII a subset of the given encoding?;
    • Unicode.1to1-- for 8-bit encodings only: are all characters translated to exactly one Unicode code point and is this translation well reversible?;
    • CharSize.8bit-- is this an 8-bit encoding, i.e. do we haveCharSize.min == CharSize.maxandCharSize.min == 1?;
    • CharSize.min-- minimal number of bytes used to represent a code point;
    • CharSize.max-- maximal number of bytes used to represent a code point.

Details

If the encoding provided is unknown to ICU (see stri_enc_list), an error is generated.

If you set a default encoding that is not a superset of ASCII or it is not an 8-bit encoding, a warning will be generated, see stringi-encoding for discussion.

See Also

Other encoding_management: stri_enc_get, stri_enc_set; stri_enc_list; stringi-encoding