re2: Create a pre-compiled regular expression

Description

Create a pre-compiled regular expression from a string.

Usage

re2(pattern, utf_8 = TRUE, case_sensitive = TRUE, posix_syntax = FALSE,
  dot_nl = FALSE, literal = FALSE, longest_match = FALSE,
  never_nl = FALSE, never_capture = FALSE, one_line = FALSE,
  perl_classes = FALSE, word_boundary = FALSE, log_error = FALSE,
  max_mem = 8388608, simplify = TRUE)

Arguments

pattern

regular expression pattern

utf_8

(true) text and pattern are UTF-8; otherwise Latin-1

case_sensitive

(true) match is case-sensitive (regexp can override with (?i) unless in posix_syntax mode)

posix_syntax

(false) restrict regexps to POSIX egrep syntax

dot_nl

(false) dot matches everything including new line

literal

(false) interpret string as literal, not regexp

longest_match

(false) search for longest match, not first match

never_nl

(false) never match \n, even if it is in regexp

never_capture

(false) parse all parens as non-capturing

one_line

(false) ^ and $ only match beginning and end of text, when posix_syntax == false this features are always enabled

perl_classes

(false) allow Perl's \d \s \w \D \S \W, when posix_syntax == false this features are always enabled

word_boundary

(false) allow Perl's \b \B (word boundary and not), when posix_syntax == false this features are always enabled

log_error

(false) log syntax and execution errors

max_mem

(see details) approx. max memory footprint of RE2

simplify

(true) return a object instead of a list when pattern length is 1.

Value

a pre-compiled regular expression

Details

The max_mem option controls how much memory can be used to hold the compiled form of the regexp (the Prog) and its cached DFA graphs.

Once a DFA fills its budget, it flushes its cache and starts over. If this happens too often, RE2 falls back on the NFA implementation.

For now, make the default budget something close to Code Search.

Default maxmem = 8<<20 = 8388608;

Examples

Run this code

# NOT RUN {
regexp = re2("test")
regexp

re2_match("abc\ndef","(?s)(.*)")
re2_match("abc\ndef", re2("(?s)(.*)", never_nl = TRUE))

re2_detect("\n", re2(".", dot_nl = TRUE))
re2_detect("\n", ".")

get_number_of_groups(re2("(A)(v)",never_capture = TRUE))

re2_match("aaabaaaa",re2("(a|aaa)",longest_match = TRUE))
re2_match("aaabaaaa",re2("(a|aaa)",longest_match = FALSE))

re2_match("a+b", re2("a+b", literal = TRUE))

re2_detect("abc" , re2("abc", posix_syntax = TRUE))
re2("(?P<name>re)")

# }
# NOT RUN {
expect_error(re2("(?P<name>re)", posix_syntax = TRUE))
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab