rebus v0.1-3

0

Monthly downloads

0th

Percentile

Build Regular Expressions in a Human Readable Way

Build regular expressions piece by piece using human readable code. This package is designed for interactive use. For package development, use the rebus.* dependencies.

Readme

Project Status: Active - The project has reached a stable, usable state and is being actively developed. Is the package on CRAN? Build Status Build status

rebus: Regular Expression Builder, Um, Something

Build regular expressions in a human readable way

Regular expressions are a very powerful tool, but the syntax is terse enough to be difficult to read. This makes bugs easy to introduce and hard to find. This package contains functions to make building regular expressions easier.

Package contents

The package contains constants for character classes (R-specific ones like ALNUM and GRAPH, generic ones like WORD, and compound ones like ISO_DATE), special characters (DOT, BACKSLASH), anchors (START, END).

There are functions for creating character classes, repetition, creating groups, capturing and all the basic regex functionality (char_class, repeated, group, capture).

Each of the class constants has a corresponding function that groups the class and allows repetition (alnum(3, 5)).

There are operators for concatenation (%R% or %c%) and alternation (%|%).

Examples

Match a hex colour, like "#99af01"

This reads Match a hash, followed by six hexadecimal values.

"#" %R% hex_digit(6)    

To match only a hex colour and nothing else, you can add anchors to the start and end of the expression.

START %R% "#" %R% hex_digit(6) %R% END

Simple email address matching.

This reads Match one or more letters, numbers, dots, underscores, percents, plusses or hyphens. Then match an 'at' symbol. Then match one or more letters, numbers, dots, or hyphens. Then match a dot. Then match two to four letters.

one_or_more(char_class(ASCII_ALNUM %R% "._%+-")) %R%
  "@" %R%
  one_or_more(char_class(ASCII_ALNUM %R% ".-")) %R%
  DOT %R%
  ascii_alpha(2, 4)

IP address matching.

First we need an expression to match numbers between 0 and 255. Both the following syntaxes read Match two then five then a number between zero and five. Or match two then a number between zero and four then a digit. Or match an optional zero or one followed by an optional digit folowed by a compulsory digit. Make this a single token, but don't capture it.

# Using the %|% operator
ip_element <- group(
  "25" %R% char_range(0, 5) %|%
  "2" %R% char_range(0, 4) %R% ascii_digit() %|%
  optional(char_class("01")) %R% optional(ascii_digit()) %R% ascii_digit()
)

# The same again, this time using the or function
ip_element <- or(
  "25" %R% char_range(0, 5),
  "2" %R% char_range(0, 4) %R% ascii_digit(),
  optional(char_class("01")) %R% optional(ascii_digit()) %R% ascii_digit()
)

# It's easier to write using number_range, though it isn't quite as optimal 
# as handcrafted regexes.
number_range(0, 255, allow_leading_zeroes = TRUE)

Now an IP address consists of 4 of these numbers separated by dots. This reads Match a word boundary. Then create a token from an ip_element followed by a dot, and repeat it three times. Then match another ip_element followed by a word boundary.

BOUNDARY %R% 
  repeated(group(ip_element %R% DOT), 3) %R% 
  ip_element %R%
  BOUNDARY

See also

The stringr and stringi packages provide tools for matching regular expressions and nicely complement this package.

The rex and Regularity packages are very similar to this package.

regular-expressions.info has good advice on using regular expression in R. In particular, see the R language page and the examples page.

debuggex.com is a visual regex debugging and testing site.

TODO

More high-level regexes for complex data types (phone numbers, post codes, car licenses, whatever).

Functions in rebus

Name Description
CharacterClasses Class Constants
ClassGroups Character classes
SpecialCharacters Special characters
Unicode Unicode classes
Anchors The start or end of a string
Backreferences Backreferences
IsoClasses ISO 8601 date-time classes
ReplacementCase Force the case of replacement values
Concatenation Combine strings together
DateTime Date-time regexes
char_class A range or char_class of characters
escape_special Escape special characters
get_weekdays Get the days of the week or months of the year
UnicodeProperty Unicode Properties
WordBoundaries Word boundaries
regex Create a regex
repeated Repeat values
exactly Make a regex exact
format.regex Print or format regex objects
rebus rebus: Regular Expression Builder, Um, Something
recursive Make the regular expression recursive.
as.regex Convert or test for regex objects
capture Capture a token, or not
number_range Generate a regular expression for a number range
or Alternation
roman Roman numerals
whole_word Match a whole word
UnicodeGeneralCategory Unicode General Categories
UnicodeOperators Unicode Operators
lookahead Lookaround
modify_mode Apply mode modifiers
literal Treat part of a regular expression literally
No Results!

Last month downloads

Details

Type Package
Date 2017-04-25
License Unlimited
LazyLoad yes
LazyData yes
Acknowledgments Development of this package was partially funded by the Proteomics Core at Weill Cornell Medical College in Qatar . The Core is supported by 'Biomedical Research Program' funds, a program funded by Qatar Foundation.
RoxygenNote 6.0.1
Collate 'export-base.R' 'export-datetimes.R' 'export-numbers.R' 'export-unicode.R' 'imports.R' 'regex-package.R'
NeedsCompilation no
Packaged 2017-04-25 16:46:25 UTC; richierocks
Repository CRAN
Date/Publication 2017-04-25 21:42:46 UTC

Include our badge in your README

[![Rdoc](http://www.rdocumentation.org/badges/version/rebus)](http://www.rdocumentation.org/packages/rebus)