rebus.base v0.0-3


Monthly downloads



Core Functionality for the 'rebus' Package

Build regular expressions piece by piece using human readable code. This package contains core functionality, and is primarily intended to be used by package developers.


Project Status: Active - The project has reached a stable, usable state and is being actively developed. Is the package on CRAN? SemaphoreCI Build Status AppVeyor Build Status

rebus.base: Regular Expression Builder, Um, Something (Base Functionality)

This package contains the core functionality for the rebus package. It is primarily intended for other R package developers. For interactive use, try rebus instead.

Build regular expressions in a human readable way

Regular expressions are a very powerful tool, but the syntax is terse enough to be difficult to read. This makes bugs easy to introduce and hard to find. This package contains functions to make building regular expressions easier.


To install the stable version, type:


To install the development version, you first need the devtools package.


Then you can install the rebus.base package using


Package contents

Character classes

Character classes to both constant and functional forms, the latter allowing matching of repeated values. For example, ALPHA represents an alphabetic character (equivalent to [:alpha:]), whereas alpha(3, 6) matches between 3 and 6 alphabetic characters (equivalent to [[:alpha:]]{3,6}).

All the POSIX classes described on the ?regex page are supported, along with generic shorthand classes, and some character ranges. For example DIGIT is the R-specifc form of a number (equivalent to [:digit:]), DGT represents the shorthand class (equivalent to \\d), and ASCII_DIGIT uses a number range (equivalent to [0-9]). See rebus.unicode for Unicode General Categories and Unicode Properties which are preferred over POSIX classes for stringi and stringr.

Custom character classes can be created using char_class. For example, to match lower case letters and punctuation, you can use char_class(LOWER %R% PUNCT) (equivalent to [[:lower:][:punct:]]). Here, %R% is a regular expression concatenation operator.

Special characters

Pre-escaped constants are available for special characters. For example DOT is \\., BACKSLASH is \\\\.

Note the difference between the special character CARET (\\^) and the anchor START (^).

escape_special is a functional form for creating those constants.


You can manually apply repetition using repeated. For example, repeated(ALPHA, 2, 8) or repeated(alpha(), 2, 8) (both equivalent to [[:alpha:]]{2,8}). This makes most sense for custom character classes.

Grouping and capturing

group creates a group within the regular expression. capture does the same, but captures it for matching. For example, group("groupies") (equivalent to (?:groupies)) and capture("groupies") (equivalent to (groupies)).

engroup takes a capture argument and calls capture when it is TRUE and group when it is FALSE.


You can match one string or another using or. For example, or("dog", "cat", "hippopotamus") (equivalent to (?:dog|cat|hippopotamus)).

or1 does the same as or, but takes a single character vector as an input. For example, or1(c("dog", "cat", "hippopotamus")) is the same as the previous example.

The %|% operator does the same for the special case of two inputs, without grouping. For example, "dog" %|% "cat" (equivalent to dog|cat).


Zero-length assertions match characters then give up the match. lookahead and negative_lookahead match forwards, and lookbehind and negative_lookbehind match backwards. Note that the last two aren't supported by R's PCRE engine, only it's Perl engine and stringi/stringr's ICU engine. For example, q %R% lookahead("u") matches "q" followed by "u", but only includes "q" in the match.


REF1 to REF9 contain references to captured groups of the form \\i, for reuse with replacement functions and R's PRCE and PErl engines. ICU_REF1 to ICU_REF9 contain references of the form $i, for use with the ICU engine.

Mode modifiers

case_insensitive makes the match not care about case.

free_spacing allows whitespace between tokens.

single_line makes the dot match line breaks and makes the caret and dollar match the start and end of the whole string. multi_line makes the dot not match line breaks, and makes the caret and dollar match character after/before line breaks as well as the start and end of the whole string.

duplicate_group_names allows groups to have the same names.

no_backslash_escaping turns off backslash escaping.

modify_mode allows multiple mode-modifiers to be set at once.


exactly forces a match to occur exactly, by wrapping it in start and end anchors.

literal treats its contents are literal characters rather than special regular expression characters.

Functions in rebus.base

Name Description
Anchors The start or end of a string.
Backreferences Backreferences
as.regex Convert or test for regex objects
capture Capture a token, or not
SpecialCharacters Special characters
WordBoundaries Word boundaries
CharacterClasses Class Constants
ClassGroups Character classes
Concatenation Combine strings together
ReplacementCase Force the case of replacement values
lookahead Lookaround
modify_mode Apply mode modifiers
regex Create a regex
repeated Repeat values
format.regex Print or format regex objects
literal Treat part of a regular expression literally
or Alternation
recursive Make the regular expression recursive.
char_class A range or char_class of characters
escape_special Escape special characters
No Results!

Last month downloads


Date 2017-04-25
License Unlimited
LazyData true
RoxygenNote 6.0.1
Collate 'alternation.R' 'regex-methods.R' 'backreferences.R' 'capture.R' 'internal.R' 'grouping-and-repetition.R' 'constants.R' 'class-groups.R' 'concatenation.R' 'compound-constants.R' 'escape_special.R' 'lookaround.R' 'misc.R' 'mode-modifiers.R'
NeedsCompilation no
Packaged 2017-04-25 15:22:09 UTC; richierocks
Repository CRAN
Date/Publication 2017-04-25 21:45:26 UTC
depends R (>= 3.1.0)
imports stats
suggests stringi , testthat

Include our badge in your README