stri_sub
extracts particular substrings at code point-based
index ranges provided. Its replacement version allows to substitute
(in-place) parts of
a string with given replacement strings. stri_sub_replace
is its magrittr's pipe-operator-friendly variant that returns
a copy of the input vector.
For extracting/replacing multiple substrings from/within each string, see
stri_sub_all
.
stri_sub(str, from = 1L, to = -1L, length)stri_sub(str, from=1L, to=-1L, length, omit_na=FALSE) <- value
stri_sub_replace(..., replacement, value = replacement)
a character vector
an integer vector giving the start indexes or a two-column matrix
of type cbind(from, to)
an integer vector giving the end indexes; mutually exclusive with
length
and from
being a matrix
an integer vector giving the substring lengths;
mutually exclusive with to
and from
being a matrix
a single logical value; indicates whether missing values
in any of the indexes or in value
leave the corresponding input string
unchanged [replacement function only]
a character vector defining the replacement strings [replacement function only]
arguments to be passed to stri_sub<-
alias of value
[wherever applicable]
stri_sub
and stri_sub_replace
return a character vector.
stri_sub<-
changes the str
object in-place.
Vectorized over str
, [value
], from
and
(to
or length
). Parameters
to
and length
are mutually exclusive.
Indexes are 1-based, i.e., the start of a string is at index 1.
For negative indexes in from
or to
,
counting starts at the end of the string.
For instance, index -1 denotes the last code point in the string.
Non-positive length
gives an empty string.
Argument from
gives the start of a substring to extract.
Argument to
defines the last index of a substring, inclusive.
Alternatively, its length
may be provided.
If from
is a two-column matrix, then these two columns are
used as from
and to
, respectively, and anything passed
explicitly as from
or to
is ignored.
Such types of index matrices are generated by stri_locate_first
and stri_locate_last
. If extraction based on
stri_locate_all
is needed, see
stri_sub_all
.
In stri_sub
, out-of-bound indexes are silently
corrected. If from
> to
, then an empty string is returned.
In stri_sub<-
, some configurations of indexes may work as
substring 'injection' at the front, back, or in middle.
If both to
and length
are provided,
length
has priority over to
.
Note that for some Unicode strings, the extracted substrings might not
be well-formed, especially if input strings are not NFC-normalized
(see stri_trans_nfc
),
include byte order marks, Bidirectional text marks, and so on.
Handle with care.
The official online manual of stringi at https://stringi.gagolewski.com/
Other indexing:
stri_locate_all_boundaries()
,
stri_locate_all()
,
stri_sub_all()
# NOT RUN {
s <- 'Lorem ipsum dolor sit amet, consectetur adipisicing elit.'
stri_sub(s, from=1:3*6, to=21)
stri_sub(s, from=c(1,7,13), length=5)
stri_sub(s, from=1, length=1:3)
stri_sub(s, -17, -7)
stri_sub(s, -5, length=4)
(stri_sub(s, 1, 5) <- 'stringi')
(stri_sub(s, -6, length=5) <- '.')
(stri_sub(s, 1, 1:3) <- 1:2)
x <- c('12 3456 789', 'abc', '', NA, '667')
stri_sub(x, stri_locate_first_regex(x, '[0-9]+')) # see stri_extract_first
stri_sub(x, stri_locate_last_regex(x, '[0-9]+')) # see stri_extract_last
stri_sub_replace(x, stri_locate_first_regex(x, '[0-9]+'),
omit_na=TRUE, replacement='***') # see stri_replace_first
stri_sub_replace(x, stri_locate_last_regex(x, '[0-9]+'),
omit_na=TRUE, replacement='***') # see stri_replace_last
stri_sub(x, stri_locate_first_regex(x, '[0-9]+'), omit_na=TRUE) <- '***'
print(x)
# }
# NOT RUN {
x %>% stri_sub_replace(1, 5, replacement='new_substring')
# }
Run the code above in your browser using DataLab