qdapRegex (version 0.7.8)

rm_zip: Remove/Replace/Extract Zip Codes

Description

Remove/replace/extract zip codes from a string.

Usage

rm_zip(
  text.var,
  trim = !extract,
  clean = TRUE,
  pattern = "@rm_zip",
  replacement = "",
  extract = FALSE,
  dictionary = getOption("regex.library"),
  ...
)

ex_zip( text.var, trim = !extract, clean = TRUE, pattern = "@rm_zip", replacement = "", extract = TRUE, dictionary = getOption("regex.library"), ... )

Value

Returns a character string with U.S. 5 and 5+4 zip codes removed.

Arguments

text.var

The text variable.

trim

logical. If TRUE removes leading and trailing white spaces.

clean

trim logical. If TRUE extra white spaces and escaped character will be removed.

pattern

A character string containing a regular expression (or character string for fixed = TRUE) to be matched in the given character vector. Default, @rm_zip uses the rm_zip regex from the regular expression dictionary from the dictionary argument.

replacement

Replacement for matched pattern.

extract

logical. If TRUE the zip codes are extracted into a list of vectors.

dictionary

A dictionary of canned regular expressions to search within if pattern begins with "@rm_".

...

Other arguments passed to gsub.

Author

stackoverflow's hwnd and Tyler Rinker <tyler.rinker@gmail.com>.

References

The time regular expression was taken from: https://stackoverflow.com/a/25223890/1000343

See Also

gsub, stri_extract_all_regex

Other rm_ functions: rm_abbreviation(), rm_between(), rm_bracket(), rm_caps_phrase(), rm_caps(), rm_citation_tex(), rm_citation(), rm_city_state_zip(), rm_city_state(), rm_date(), rm_default(), rm_dollar(), rm_email(), rm_emoticon(), rm_endmark(), rm_hash(), rm_nchar_words(), rm_non_ascii(), rm_non_words(), rm_number(), rm_percent(), rm_phone(), rm_postal_code(), rm_repeated_characters(), rm_repeated_phrases(), rm_repeated_words(), rm_tag(), rm_time(), rm_title_name(), rm_url(), rm_white()

Examples

Run this code
x <- c("Mr. Bean bought 2 tickets 2-613-213-4567",
  "43 Butter Rd, Brossard QC K0A 3P0 - 613 213 4567", 
  "Rat Race, XX, 12345",
  "Ignore phone numbers(613)2134567",
  "Grab zips with dashes 12345-6789 or no space before12345-6789",  
  "Grab zips with spaces 12345 6789 or no space before12345 6789",
  "I like 1234567 dogs"
)

rm_zip(x)
ex_zip(x)

## ======================= ##
## BUILD YOUR OWN FUNCTION ##
## ======================= ##

## example from: https://stackoverflow.com/a/26092576/1000343
zips <- data.frame(id = seq(1, 6), 
    address = c("Company, 18540 Main Ave., City, ST 12345", 
    "Company 18540 Main Ave. City ST 12345-0000", 
    "Company 18540 Main Ave. City State 12345", 
    "Company, 18540 Main Ave., City, ST 12345 USA", 
    "Company, One Main Ave Suite 18540m, City, ST 12345",
    "company 12345678")
)

## Function to grab even if a character follows the zip

# paste together a more flexible regular expression    
pat <- pastex(
    "@rm_zip", 
    "(?\\d)\\d{5}(?!\\d)",
    "(?<!\\d)\\d{5}-\\d{4}(?!\\d)"
)
# Create your own function that extract is set to TRUE
ex_zip2 <- rm_(pattern=pat, extract=TRUE)
ex_zip2(zips$address)

## Function to extract just 5 digit zips

ex_zip3 <- rm_(pattern="(?<!\\d)\\d{5}(?!\\d)", extract=TRUE)
ex_zip3(zips$address)

Run the code above in your browser using DataLab