rematch2 (version 2.0.1)

re_match_all: Extract All Regular Expression Matches Into a Data Frame

Description

This function is a thin wrapper on the gregexpr base R function, to extract the matching (sub)strings as a data frame. It extracts all matches, and potentially their capture groups as well.

Usage

re_match_all(text, pattern, perl = TRUE, ...)

Arguments

text

Character vector.

pattern

A regular expression. See regex for more about regular expressions.

perl

logical should perl compatible regular expressions be used? Defaults to TRUE, setting to FALSE will disable capture groups.

...

Additional arguments to pass to gregexpr (or regexpr if text is of length zero).

Value

A tidy data frame (see Section “Tidy Data”). The list columns contain character vectors with as many entries as there are matches for each input element.

Tidy Data

The return value is a tidy data frame where each row corresponds to an element of the input character vector text. The values from text appear for reference in the .text character column. All other columns are list columns containing the match data. The .match column contains the match information for full regular expression matches while other columns correspond to capture groups if there are any, and PCRE matches are enabled with perl = TRUE (this is on by default). If capture groups are named the corresponding columns will bear those names.

Each match data column list contains match records, one for each element in text. A match record is a named list, with entries match, start and end that are respectively the matching (sub) string, the start, and the end positions (using one based indexing).

See Also

Other tidy regular expression matching: re_exec_all, re_exec, re_match

Examples

Run this code
# NOT RUN {
name_rex <- paste0(
  "(?<first>[[:upper:]][[:lower:]]+) ",
  "(?<last>[[:upper:]][[:lower:]]+)"
)
notables <- c(
  "  Ben Franklin and Jefferson Davis",
  "\tMillard Fillmore"
)
re_match_all(notables, name_rex)
# }

Run the code above in your browser using DataLab