qdapRegex (version 0.7.5)

rm_non_words: Remove/Replace/Extract Non-Words

Description

rm_non_words - Remove/replace/extract non-words (Anything that's not a letter or apostrophe; also removes multiple white spaces) from a string.

Usage

rm_non_words(
  text.var,
  trim = !extract,
  clean = TRUE,
  pattern = "@rm_non_words",
  replacement = " ",
  extract = FALSE,
  dictionary = getOption("regex.library"),
  ...
)

ex_non_words( text.var, trim = !extract, clean = TRUE, pattern = "[^A-Za-z' ]+", replacement = " ", extract = TRUE, dictionary = getOption("regex.library"), ... )

Value

Returns a character string with non-words removed.

Arguments

text.var

The text variable.

trim

logical. If TRUE removes leading and trailing white spaces.

clean

trim logical. If TRUE extra white spaces and escaped character will be removed.

pattern

A character string containing a regular expression (or character string for fixed = TRUE) to be matched in the given character vector. Default, @rm_non_words uses the rm_non_words regex from the regular expression dictionary from the dictionary argument.

replacement

Replacement for matched pattern (Note: default is " ", whereas most qdapRegex functions replace with "").

extract

logical. If TRUE the non-words are extracted into a list of vectors.

dictionary

A dictionary of canned regular expressions to search within if pattern begins with "@rm_".

...

Other arguments passed to gsub.

See Also

gsub, stri_extract_all_regex

Other rm_ functions: rm_abbreviation(), rm_between(), rm_bracket(), rm_caps_phrase(), rm_caps(), rm_citation_tex(), rm_citation(), rm_city_state_zip(), rm_city_state(), rm_date(), rm_default(), rm_dollar(), rm_email(), rm_emoticon(), rm_endmark(), rm_hash(), rm_nchar_words(), rm_non_ascii(), rm_number(), rm_percent(), rm_phone(), rm_postal_code(), rm_repeated_characters(), rm_repeated_phrases(), rm_repeated_words(), rm_tag(), rm_time(), rm_title_name(), rm_url(), rm_white(), rm_zip()

Examples

Run this code
x <- c(
    "I like 56 dogs!",
    "It's seventy-two feet from the px290.",
    NA,
    "What",
    "that1is2a3way4to5go6.",
    "What do you*% want?  For real%; I think you'll see.",
    "Oh some code to remove"
)

rm_non_words(x)
ex_non_words(x)

Run the code above in your browser using DataLab