textclean (version 0.9.2)

mgsub: Multiple gsub

Description

mgsub - A wrapper for gsub that takes a vector of search terms and a vector or single value of replacements.

mgsub_fixed - An alias for mgsub.

mgsub_regex - An wrapper for mgsub with fixed = FALSE.

mgsub_regex_safe - An wrapper for mgsub.

Usage

mgsub(x, pattern, replacement, leadspace = FALSE, trailspace = FALSE,
  fixed = TRUE, trim = FALSE, order.pattern = fixed, safe = FALSE, ...)

mgsub_fixed(x, pattern, replacement, leadspace = FALSE, trailspace = FALSE, fixed = TRUE, trim = FALSE, order.pattern = fixed, safe = FALSE, ...)

mgsub_regex(x, pattern, replacement, leadspace = FALSE, trailspace = FALSE, fixed = FALSE, trim = FALSE, order.pattern = fixed, ...)

mgsub_regex_safe(x, pattern, replacement, ...)

Arguments

x

A character vector.

pattern

Character string to be matched in the given character vector.

replacement

Character string equal in length to pattern or of length one which are a replacement for matched pattern.

leadspace

logical. If TRUE inserts a leading space in the replacements.

trailspace

logical. If TRUE inserts a trailing space in the replacements.

fixed

logical. If TRUE, pattern is a string to be matched as is. Overrides all conflicting arguments.

trim

logical. If TRUE leading and trailing white spaces are removed and multiple white spaces are reduced to a single white space.

order.pattern

logical. If TRUE and fixed = TRUE, the pattern string is sorted by number of characters to prevent substrings replacing meta strings (e.g., pattern = c("the", "then") resorts to search for "then" first).

safe

logical. If TRUE then the mgsub package is used as the backend and performs safe substitutions. The trade-off is that this mode will slow the replacements down considerably.

Additional arguments passed to gsub. In mgsub_regex_safe this is other arguments passed to mgsub.

Value

mgsub - Returns a vector with the pattern replaced.

See Also

replace_tokens gsub

Examples

Run this code
# NOT RUN {
mgsub(DATA$state, c("it's", "I'm"), c("it is", "I am"))
mgsub(DATA$state, "[[:punct:]]", "PUNC", fixed = FALSE)
# }
# NOT RUN {
library(textclean)
hunthou <- replace_number(seq_len(1e5)) 

textclean::mgsub(
    "'twenty thousand three hundred five' into 20305", 
    hunthou, 
    seq_len(1e5)
)
## "'20305' into 20305"

## Larger example from: https://stackoverflow.com/q/18332463/1000343
## A slower approach
fivehunthou <- replace_number(seq_len(5e5)) 

testvect <- c("fifty seven", "four hundred fifty seven", 
    "six thousand four hundred fifty seven", 
    "forty six thousand four hundred fifty seven", 
    "forty six thousand four hundred fifty seven", 
    "three hundred forty six thousand four hundred fifty seven"
)

textclean::mgsub(testvect, fivehunthou, seq_len(5e5))

## Safe substitution: Uses the mgsub package as the backend
dubious_string <- "Dopazamine is a fake chemical"
pattern <- c("dopazamin","do.*ne")
replacement <- c("freakout","metazamine")

mgsub(dubious_string, pattern, replacement, ignore.case = TRUE, fixed = FALSE)
mgsub(dubious_string, pattern, replacement, safe = TRUE, fixed = FALSE)
# }

Run the code above in your browser using DataCamp Workspace