Learn R Programming

rebus.unicode (version 0.0-2.1)

ugc_cased_letter: Unicode General Categories

Description

Match a Unicode General Category.

Usage

ugc_cased_letter(lo, hi, char_class = TRUE)

ugc_close_punctuation(lo, hi, char_class = TRUE)

ugc_connector_punctuation(lo, hi, char_class = TRUE)

ugc_control(lo, hi, char_class = TRUE)

ugc_currency_symbol(lo, hi, char_class = TRUE)

ugc_dash_punctuation(lo, hi, char_class = TRUE)

ugc_decimal_number(lo, hi, char_class = TRUE)

ugc_enclosing_mark(lo, hi, char_class = TRUE)

ugc_final_punctuation(lo, hi, char_class = TRUE)

ugc_format_control(lo, hi, char_class = TRUE)

ugc_initial_punctuation(lo, hi, char_class = TRUE)

ugc_letter(lo, hi, char_class = TRUE)

ugc_letter_number(lo, hi, char_class = TRUE)

ugc_line_separator(lo, hi, char_class = TRUE)

ugc_lowercase_letter(lo, hi, char_class = TRUE)

ugc_mark(lo, hi, char_class = TRUE)

ugc_math_symbol(lo, hi, char_class = TRUE)

ugc_modifier_letter(lo, hi, char_class = TRUE)

ugc_modifier_symbol(lo, hi, char_class = TRUE)

ugc_nonspacing_mark(lo, hi, char_class = TRUE)

ugc_number(lo, hi, char_class = TRUE)

ugc_open_punctuation(lo, hi, char_class = TRUE)

ugc_other(lo, hi, char_class = TRUE)

ugc_other_letter(lo, hi, char_class = TRUE)

ugc_other_number(lo, hi, char_class = TRUE)

ugc_other_punctuation(lo, hi, char_class = TRUE)

ugc_other_symbol(lo, hi, char_class = TRUE)

ugc_paragraph_separator(lo, hi, char_class = TRUE)

ugc_private_use_control(lo, hi, char_class = TRUE)

ugc_punctuation(lo, hi, char_class = TRUE)

ugc_separator(lo, hi, char_class = TRUE)

ugc_space_separator(lo, hi, char_class = TRUE)

ugc_spacing_mark(lo, hi, char_class = TRUE)

ugc_surrogate_control(lo, hi, char_class = TRUE)

ugc_symbol(lo, hi, char_class = TRUE)

ugc_titlecase_letter(lo, hi, char_class = TRUE)

ugc_unassigned_control(lo, hi, char_class = TRUE)

ugc_uppercase_letter(lo, hi, char_class = TRUE)

UGC_UPPERCASE_LETTER

UGC_LOWERCASE_LETTER

UGC_TITLECASE_LETTER

UGC_CASED_LETTER

UGC_MODIFIER_LETTER

UGC_OTHER_LETTER

UGC_LETTER

UGC_NONSPACING_MARK

UGC_SPACING_MARK

UGC_ENCLOSING_MARK

UGC_MARK

UGC_DECIMAL_NUMBER

UGC_LETTER_NUMBER

UGC_OTHER_NUMBER

UGC_NUMBER

UGC_CONNECTOR_PUNCTUATION

UGC_DASH_PUNCTUATION

UGC_OPEN_PUNCTUATION

UGC_CLOSE_PUNCTUATION

UGC_INITIAL_PUNCTUATION

UGC_FINAL_PUNCTUATION

UGC_OTHER_PUNCTUATION

UGC_PUNCTUATION

UGC_MATH_SYMBOL

UGC_CURRENCY_SYMBOL

UGC_MODIFIER_SYMBOL

UGC_OTHER_SYMBOL

UGC_SYMBOL

UGC_SPACE_SEPARATOR

UGC_LINE_SEPARATOR

UGC_PARAGRAPH_SEPARATOR

UGC_SEPARATOR

UGC_CONTROL

UGC_FORMAT_CONTROL

UGC_SURROGATE_CONTROL

UGC_PRIVATE_USE_CONTROL

UGC_UNASSIGNED_CONTROL

UGC_OTHER

Value

A character vector representing part or all of a regular expression.

Format

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

An object of class regex (inherits from character) of length 1.

Arguments

lo

A non-negative integer. Minimum number of repeats, when grouped.

hi

positive integer. Maximum number of repeats, when grouped.

char_class

TRUE or FALSE. Should the values be wrapped into a character class?

References

Table 12 of the Unicode Standard Annex #44 defines the Unicode General Categories. http://www.unicode.org/reports/tr44

You can see which characters are contained in a category by visiting, e.g., http://www.fileformat.info/info/unicode/category/Nd/list.htm

See Also

unicode_property, Unicode

Examples

Run this code
# Classes
ugc_lowercase_letter()
ugc_decimal_number()
ugc_paragraph_separator()
ugc_currency_symbol()

# With repetition
ugc_nonspacing_mark(3, 6)
ugc_separator(1, Inf)
ugc_dash_punctuation(0, Inf)

# Without a class wrapper
ugc_titlecase_letter(char_class = FALSE)

# Constants
UGC_UPPERCASE_LETTER
UGC_LETTER_NUMBER
UGC_MATH_SYMBOL
UGC_FORMAT_CONTROL

if (FALSE) {
# All the Unicode general categories.
# Not run, since it generates lots of output
ls("package:rebus.unicode", pattern = "^ugc")
}

# Usage
library(rebus.base)
x <- "I exchanged $1000 for \u20ac665.41 and \u00a3243.13."
(rx <- capture(ugc_currency_symbol()) %R% 
  capture(
    ugc_decimal_number(1, Inf) %R%
    optional(group("." %R% ugc_decimal_number(2)))
  )
)
stringi::stri_match_all_regex(x, rx)

Run the code above in your browser using DataLab