textclean (version 0.9.3)

fgsub: Replace a Regex with an Functional Operation on the Regex Match

Description

This is a stripped down version of gsubfn from the gsubfn package. It finds a regex match, and then uses a function to operate on these matches and uses them to replace the original matches. Note that the stringi packages is used for matching and extracting the regex matches. For more powerful or flexible needs please see the gsubfn package.

Usage

fgsub(x, pattern, fun, ...)

Arguments

x

A character vector.

pattern

Character string to be matched in the given character vector.

fun

A function to operate on the extracted matches.

ignored.

Value

Returns a vector with the pattern replaced.

See Also

gsubfn

Examples

Run this code
# NOT RUN {
## In this example the regex looks for words that contain a lower case letter 
## followed by the same letter at least 2 more times.  It then extracts these
## words, splits them appart into letters, reverses the string, pastes them
## back together, wraps them with double angle braces, and then puts them back 
## at the original locations.
fgsub(
    x = c(NA, 'df dft sdf', 'sd fdggg sd dfhhh d', 'ddd'),
    pattern = "\\b\\w*([a-z])(\\1{2,})\\w*\\b",
    fun = function(x) {
        paste0('<<', paste(rev(strsplit(x, '')[[1]]), collapse =''), '>>')
    }    
)

## In this example we extract numbers, strip out non-digits, coerce them to 
## numeric, cut them in half, round up to the closest integer, add the commas 
## back, and replace back into the original locations.
fgsub(
    x = c(NA, 'I want 32 grapes', 'he wants 4 ice creams', 
        'they want 1,234,567 dollars'
    ),
    pattern = "[\\d,]+",
    fun = function(x) {
        prettyNum(
            ceiling(as.numeric(gsub('[^0-9]', '', x))/2), 
            big.mark = ','
        )
    }    
)

## In this example we extract leading zeros, convert to an equal number of 
## spaces. 
fgsub(
    x = c(NA, "00:04", "00:08", "00:01", "06:14", "00:02", "00:04"),
    pattern = '^0+',
    fun = function(x) {gsub('0', ' ', x)}
)
# }

Run the code above in your browser using DataLab