qdapRegex (version 0.7.5)

rm_date: Remove/Replace/Extract Dates

Description

Remove/replace/extract dates from a string in the form of (1) XX/XX/XXXX, XX/XX/XX, XX-XX-XXXX, XX-XX-XX, XX.XX.XXXX, or XX.XX.XX OR (2) March XX, XXXX or Mar XX, XXXX OR (3) both forms.

Usage

rm_date(
  text.var,
  trim = !extract,
  clean = TRUE,
  pattern = "@rm_date",
  replacement = "",
  extract = FALSE,
  dictionary = getOption("regex.library"),
  ...
)

ex_date( text.var, trim = !extract, clean = TRUE, pattern = "@rm_date", replacement = "", extract = TRUE, dictionary = getOption("regex.library"), ... )

Value

Returns a character string with dates removed.

Arguments

text.var

The text variable.

trim

logical. If TRUE removes leading and trailing white spaces.

clean

trim logical. If TRUE extra white spaces and escaped character will be removed.

pattern

A character string containing a regular expression (or character string for fixed = TRUE) to be matched in the given character vector (see Details for additional information). Default, @rm_date uses the rm_date regex from the regular expression dictionary from the dictionary argument.

replacement

Replacement for matched pattern.

extract

logical. If TRUE the dates are extracted into a list of vectors.

dictionary

A dictionary of canned regular expressions to search within if pattern begins with "@rm_".

...

Other arguments passed to gsub.

Details

The default regular expression used by rm_date finds numeric representations not word/abbreviations. This means that "June 13, 2002" is not matched. This behavior can be altered (to include month names/abbreviations) by using a secondary regular expression from the regex_usa data (or other dictionary) via (pattern = "@rm_date2", pattern = "@rm_date3", or pattern = "@rm_date4"). See Examples for example usage.

See Also

gsub, stri_extract_all_regex

Other rm_ functions: rm_abbreviation(), rm_between(), rm_bracket(), rm_caps_phrase(), rm_caps(), rm_citation_tex(), rm_citation(), rm_city_state_zip(), rm_city_state(), rm_default(), rm_dollar(), rm_email(), rm_emoticon(), rm_endmark(), rm_hash(), rm_nchar_words(), rm_non_ascii(), rm_non_words(), rm_number(), rm_percent(), rm_phone(), rm_postal_code(), rm_repeated_characters(), rm_repeated_phrases(), rm_repeated_words(), rm_tag(), rm_time(), rm_title_name(), rm_url(), rm_white(), rm_zip()

Examples

Run this code
## Numeric Date Representation
x <- paste0("Format dates as 04/12/2014, 04-12-2014, 04.12.2014. or",
    " 04/12/14 but leaves mismatched: 12.12/2014")
rm_date(x)
ex_date(x)

## Word/Abbreviation Date Representation
x2 <- paste0("Format dates as Sept 09, 2002 or October 22, 1887",
  "but not 04-12-2014 and may match good 00, 9999")
rm_date(x2, pattern="@rm_date2")
ex_date(x2, pattern="@rm_date2")


## Year-Month-Day Representation
x3 <- sprintf("R uses time in this format %s.", Sys.time())
rm_date(x3, pattern="@rm_date3")

## Grab all types
ex_date(c(x, x2, x3), pattern="@rm_date4")

Run the code above in your browser using DataLab