THE String Processing Package
stringi is THE R package for fast, correct, consistent, and convenient string/text manipulation. It gives predictable results on every platform, in each locale, and under any ``native'' character encoding.
Keywords: R, text processing, character strings, internationalization, localization, ICU, ICU4C, i18n, l10n, Unicode.
License: The BSD-3-clause license for the package code, the ICU license for the accompanying ICU4C distribution, and the UCD license for the Unicode Character Database. See the COPYRIGHTS and LICENSE file for more details.
Manual pages on general topics:
- stringi-encoding -- character encoding issues, including information on encoding management in stringi, as well as on encoding detection and conversion.
- stringi-locale -- locale issues, including locale
management and specification in stringi, and the list of
locale-sensitive operations. In particular, see
stri_opts_collatorfor a description of the string collation algorithm, which is used for string comparing, ordering, sorting, case-folding, and searching.
- stringi-arguments -- information on how stringi treats its functions' arguments.
Refer to the following:
- stringi-search for string searching facilities;
these include pattern searching, matching, string splitting, and so on.
The following independent search engines are provided:
- stringi-search-regex -- with ICU (Java-like) regular expressions,
- stringi-search-fixed -- fast, locale-independent, bytewise pattern matching,
- stringi-search-coll -- locale-aware pattern matching for natural language processing tasks,
- stringi-search-charclass -- seeking elements of particular character classes, like ``all whitespaces'' or ``all digits'',
- stringi-search-boundaries -- text boundary analysis.
stri_datetime_formatfor date/time formatting and parsing. Also refer to the links therein for other date/time/time zone- related operations.
stri_stats_latexfor gathering some fancy statistics on a character vector's contents.
stri_flattenfor concatenation-based operations.
stri_subfor extracting and replacing substrings, and
stri_reversefor a joyful function to reverse all code points in a string.
stri_length(among others) for determining the number of code points in a string. See also
stri_count_boundariesfor counting the number of
stri_widthfor approximating the width of a string.
stri_trim(among others) for trimming characters from the beginning or/and end of a string, see also stringi-search-charclass, and
stri_padfor padding strings so that they are of the same width. Additionally,
stri_wrapwraps text into lines.
stri_trans_tolower(among others) for case mapping, i.e., conversion to lower, UPPER, or Title Case,
stri_trans_nfc(among others) for Unicode normalization,
stri_trans_charfor translating invidual code points, and
stri_trans_generalfor other very general yet powerful text transforms, including transliteration.
stri_duplicatedfor collation-based, locale-aware operations, see also stringi-locale.
stri_split_lines(among others) to split a string into text lines.
stri_escape_unicode(among others) for escaping certain code points.
stri_rand_lipsumfor generating (pseudo)random strings.
- DRAFT API:
stri_write_linesfor reading and writing text files.
stringi Package homepage, http://www.gagolewski.com/software/stringi/
ICU -- International Components for Unicode, http://www.icu-project.org/
ICU4C API Documentation, http://www.icu-project.org/apiref/icu4c/
The Unicode Consortium, http://www.unicode.org/
UTF-8, a transformation format of ISO 10646 -- RFC 3629, http://tools.ietf.org/html/rfc3629