A dataset containing a list of supplemental, canned regular expressions. The
regular expressions in this data set are considered useful but have not been
included in a formal function (of the type rm_XXX
). Users can utilize
the rm_
function to generate functions that can sub/replace/extract as
desired.
data(regex_supplement)
A list with 24 elements
The following canned regular expressions are included:
single word after the word "a"
single word after the word "the"
find single word after ? word (? = user defined); note contains "%s"
that is replaced by sprintf
and is not a valid regex on its own (user supplies (1) n before, (2) the point, & (3) n after)
find n words (not including punctuation) before or after ? word (? = user defined); note contains "%s"
that is replaced by sprintf
and is not a valid regex on its own (user supplies (1) n before, (2) the point, & (3) n after)
find n words (plus punctuation) before or after ? word (? = user defined); note contains "%s"
that is replaced by sprintf
and is not a valid regex on its own
find sing word before ? word (? = user defined); note contains "%s"
that is replaced by sprintf
and is not a valid regex on its own
find all occurrences of a substring except the first; regex pattern retrieved from StackOverflow's akrun: http://stackoverflow.com/a/31458261/1000343
substring beginning with hash (#) followed by either 3 or 6 select characters (a-f, A-F, and 0-9)
substring of four chunks of 1-3 consecutive digits separated with dots (.)
last occurrence of a delimiter; note contains "%s"
that is replaced by sprintf
and is not a valid regex on its own (user supplies the delimiter)
substring with "pp." or "p.", optionally followed by a space, followed by 1 or more digits, optionally followed by a dash, optionally followed by 1 or more digits, optionally followed by a semicolon, optionally followed by a space, optionally followed by 1 or more digits; intended for extraction/removal purposes
substring 1 or more digits, optionally followed by a dash, optionally followed by 1 or more digits, optionally followed by a semicolon, optionally followed by a space, optionally followed by 1 or more digits; intended for validation purposes
punctuation characters ([:punct:]
) with the ability to negate; note contains "%s"
that is replaced by sprintf
and is not a valid regex on its own
a regex that is useful for splitting strings in the characters runs (e.g., "wwxyyyzz" becomes "ww", "x", "yyy", "zz"); regex pattern retrieved from Robert Redd: http://stackoverflow.com/a/29383435/1000343
regex string that splits on a delimiter and retains the delimiter
chunks digits > 4 into groups of 3 from right to left allowing for easy insertion of thousands separator; regex pattern retrieved from StackOverflow's stema: http://stackoverflow.com/a/10612685/1000343
substring of valid hours (1-12) followed by a colon (:) followed by valid minutes (0-60), followed by an optional space and the character chunk am or pm
substring starting with "v" or "version" optionally followed by a space and then period separated digits for <major>.<minor>.<release>.<build>; the build sequence is optional and the "version"/"v" IS NOT contained in the substring
substring starting with "v" or "version" optionally followed by a space and then period separated digits for <major>.<minor>.<release>.<build>; the build sequence is optional and the "version"/"v" IS contained in the substring
substring of white space after a comma
A true word boundary that only includes alphabetic characters; based on www.rexegg.com's suggestion taken from discussion of true word boundaries; note contains "%s"
that is replaced by sprintf
and is not a valid regex on its own
A true left word boundary that only includes alphabetic characters; based on www.rexegg.com's suggestion taken from discussion of true word boundaries
A true right word boundary that only includes alphabetic characters; based on www.rexegg.com's suggestion taken from discussion of true word boundaries
substring of the video id from a YouTube video; taken from Jacob Overgaard's submission found https://regex101.com/r/kU7bP8/1
Regexes from this data set can be added to the pattern
argument of any
rm_XXX
function via an at sign (@) followed by a regex name from
this data set (e.g., pattern = "@after_the"
) provided the regular
expression does not contain non-regex such as sprintf
character string %s
.
Use qdapRegex:::examine_regex(regex_supplement)
to
interactively explore the regular expressions in regex_usa
. This will
provide a browser + console based break down of each regex in the dictionary.