
Last chance! 50% off unlimited learning
Sale ends in
This suite of functions was written to implement many of the features
of the UNIX sed
program entirely within S (function sedit
).
The substring.location
function returns the first and last position
numbers that a sub-string occupies in a larger string. The substring2<-
function does the opposite of the builtin function substring
.
It is named substring2
because for S-Plus there is a built-in
function substring
, but it does not handle multiple replacements in
a single string.
replace.substring.wild
edits character strings in the fashion of
"change xxxxANYTHINGyyyy to aaaaANYTHINGbbbb", if the "ANYTHING"
passes an optional user-specified test
function. Here, the
"yyyy" string is searched for from right to left to handle
balancing parentheses, etc. numeric.string
and all.digits
are two examples of test
functions, to check,
respectively if each of a vector of strings is a legal numeric or if it contains only
the digits 0-9. For the case where old="*$" or "^*"
, or for
replace.substring.wild
with the same values of old
or with
front=TRUE
or back=TRUE
, sedit
(if wild.literal=FALSE
) and
replace.substring.wild
will edit the largest substring
satisfying test
.
substring2
is just a copy of substring
so that
substring2<-
will work.
sedit(text, from, to, test, wild.literal=FALSE)
substring.location(text, string, restrict)
# substring(text, first, last) <- setto # S-Plus only
replace.substring.wild(text, old, new, test, front=FALSE, back=FALSE)
numeric.string(string)
all.digits(string)
substring2(text, first, last)
substring2(text, first, last) <- value
sedit
returns a vector of character strings the same length as text
.
substring.location
returns a list with components named first
and last
, each specifying a vector of character positions corresponding
to matches. replace.substring.wild
returns a single character string.
numeric.string
and all.digits
return a single logical value.
a vector of character strings for sedit, substring2, substring2<-
or a single character string for substring.location,
replace.substring.wild
.
a vector of character strings to translate from, for sedit
.
A single asterisk wild card, meaning allow any sequence of characters
(subject to the test
function, if any) in place of the "*"
.
An element of from
may begin with "^"
to force the match to
begin at the beginning of text
, and an element of from
can end with
"$"
to force the match to end at the end of text
.
a vector of character strings to translate to, for sedit
.
If a corresponding element in from
had an "*"
, the element
in to
may also have an "*"
. Only single asterisks are allowed.
If to
is not the same length as from
, the rep
function
is used to make it the same length.
a single character string, for substring.location
, numeric.string
,
all.digits
a vector of integers specifying the first position to replace for
substring2<-
. first
may also be a vector of character strings
that are passed to sedit
to use as patterns for replacing
substrings with setto
. See one of the last examples below.
a vector of integers specifying the ending positions of the character
substrings to be replaced. The default is to go to the end of
the string. When first
is character, last
must be
omitted.
a character string or vector of character strings used as replacements,
in substring2<-
a character string to translate from for replace.substring.wild
.
May be "*$"
or "^*"
or any string containing a single "*"
but
not beginning with "^"
or ending with "$"
.
a character string to translate to for replace.substring.wild
a function of a vector of character strings returning a logical vector
whose elements are TRUE
or FALSE
according
to whether that string element qualifies as the wild card string for
sedit, replace.substring.wild
set to TRUE
to not treat asterisks as wild cards and to not look for
"^"
or "$"
in old
a vector of two integers for substring.location
which specifies a
range to which the search for matches should be restricted
specifying front = TRUE
and old = "*"
is the same as
specifying old = "^*"
specifying back = TRUE
and old = "*"
is the same as
specifying old = "*$"
a character vector
substring2<-
modifies its first argument
Frank Harrell
Department of Biostatistics
Vanderbilt University School of Medicine
fh@fharrell.com
x <- 'this string'
substring2(x, 3, 4) <- 'IS'
x
substring2(x, 7) <- ''
x
substring.location('abcdefgabc', 'ab')
substring.location('abcdefgabc', 'ab', restrict=c(3,999))
replace.substring.wild('this is a cat','this*cat','that*dog')
replace.substring.wild('there is a cat','is a*', 'is not a*')
replace.substring.wild('this is a cat','is a*', 'Z')
qualify <- function(x) x==' 1.5 ' | x==' 2.5 '
replace.substring.wild('He won 1.5 million $','won*million',
'lost*million', test=qualify)
replace.substring.wild('He won 1 million $','won*million',
'lost*million', test=qualify)
replace.substring.wild('He won 1.2 million $','won*million',
'lost*million', test=numeric.string)
x <- c('a = b','c < d','hello')
sedit(x, c('=','he*o'),c('==','he*'))
sedit('x23', '*$', '[*]', test=numeric.string)
sedit('23xx', '^*', 'Y_{*} ', test=all.digits)
replace.substring.wild("abcdefabcdef", "d*f", "xy")
x <- "abcd"
substring2(x, "bc") <- "BCX"
x
substring2(x, "B*d") <- "B*D"
x
Run the code above in your browser using DataLab