str_sub

0th

Percentile

Extract and replace substrings from a character vector.

str_sub will recycle all arguments to be the same length as the longest argument. If any arguments are of length 0, the output will be a zero length character vector.

Usage
str_sub(string, start = 1L, end = -1L)

str_sub(string, start = 1L, end = -1L, omit_na = FALSE) <- value

Arguments
string

input character vector.

start, end

Two integer vectors. start gives the position of the first character (defaults to first), end gives the position of the last (defaults to last character). Alternatively, pass a two-column matrix to start.

Negative values count backwards from the last character.

omit_na

Single logical value. If TRUE, missing values in any of the arguments provided will result in an unchanged input.

value

replacement string

Details

Substrings are inclusive - they include the characters at both start and end positions. str_sub(string, 1, -1) will return the complete substring, from the first character to the last.

Value

A character vector of substring from start to end (inclusive). Will be length of longest input argument.

See Also

The underlying implementation in stringi::stri_sub()

Aliases
  • str_sub
  • str_sub<-
Examples
# NOT RUN {
hw <- "Hadley Wickham"

str_sub(hw, 1, 6)
str_sub(hw, end = 6)
str_sub(hw, 8, 14)
str_sub(hw, 8)
str_sub(hw, c(1, 8), c(6, 14))

# Negative indices
str_sub(hw, -1)
str_sub(hw, -7)
str_sub(hw, end = -7)

# Alternatively, you can pass in a two colum matrix, as in the
# output from str_locate_all
pos <- str_locate_all(hw, "[aeio]")[[1]]
str_sub(hw, pos)
str_sub(hw, pos[, 1], pos[, 2])

# Vectorisation
str_sub(hw, seq_len(str_length(hw)))
str_sub(hw, end = seq_len(str_length(hw)))

# Replacement form
x <- "BBCDEF"
str_sub(x, 1, 1) <- "A"; x
str_sub(x, -1, -1) <- "K"; x
str_sub(x, -2, -2) <- "GHIJ"; x
str_sub(x, 2, -2) <- ""; x

# If you want to keep the original if some argument is NA,
# use omit_na = TRUE
x1 <- x2 <- x3 <- x4 <- "AAA"
str_sub(x1, 1, NA) <- "B"
str_sub(x2, 1, 2) <- NA
str_sub(x3, 1, NA, omit_na = TRUE) <- "B"
str_sub(x4, 1, 2, omit_na = TRUE) <- NA
x1; x2; x3; x4
# }
Documentation reproduced from package stringr, version 1.3.1, License: GPL-2 | file LICENSE

Community examples

antoine.fabri@gmail.com at Jun 13, 2018 stringr v1.3.1

Comparison to `base::substr` , we take the examples from doc with slight alterations. ```r hw <- "Hadley Wickham" ``` ## Same basic use ```r identical(str_sub(hw, 1, 6), substr(hw, 1, 6)) # [1] TRUE identical(str_sub(hw, 8, 14), substr(hw, 8, 14)) # [1] TRUE ``` ## `substr` doesn't have default values ```r str_sub(hw, end = 6) # [1] "Hadley" substr(hw,stop=6) # Error in substr(hw, stop = 6) : # argument "start" is missing, with no default identical(str_sub(hw, end = 6), substr(hw, 1, 6)) # [1] TRUE str_sub(hw, 8) # [1] "Wickham" substr(hw,start=8) # Error in substr(hw, start = 8) : # argument "stop" is missing, with no default identical(str_sub(hw, 8) , substr(hw, 8, 14)) # [1] TRUE ``` ## different ways of dealing with negative indices For `substr`, a negative value for `start` is equivalent to setting it to `1`, and a negative value for `stop` is equivalent to setting it to `0`. For `str_sub` it means starting from the end with the last position being `-1`. ```r str_sub(hw, -1) # [1] "m" substr(hw,-1, 14) # [1] "Hadley Wickham" identical(str_sub(hw, -1), substr(hw, 14+1 -1, 14)) # [1] TRUE str_sub(hw, end = -7) # [1] "Hadley W" substr(hw,1, -7) # [1] "" identical(str_sub(hw, -1), substr(hw, 14,14)) # [1] TRUE ``` ## Vectorisation For `substr` Simple vectorization is not supported by default (only 1st element is considered). ```r str_sub(hw, c(1, 8), c(6, 14)) # [1] "Hadley" "Wickham" substr(hw, c(1, 8), c(6, 14)) # [1] "Hadley" identical(str_sub(hw, c(1, 8), c(6, 14)), Vectorize(substr,USE.NAMES = FALSE)(hw, c(1, 8), c(6, 14))) # TRUE str_sub(hw, seq_len(str_length(hw))) identical(str_sub(hw, seq_len(str_length(hw))), Vectorize(substr,USE.NAMES = FALSE)(hw, seq_len(str_length(hw)), 14)) # TRUE identical(str_sub(hw, end = seq_len(str_length(hw))), Vectorize(substr,USE.NAMES = FALSE)(hw, 1, seq_len(str_length(hw)))) # TRUE ``` `substr` doesn't support passing a 2 column matrix as the 2nd argument: ```r pos <- str_locate_all(hw, "[aeio]")[[1]] str_sub(hw, pos) str_sub(hw, pos[, 1], pos[, 2]) identical(str_sub(hw, pos), Vectorize(substr,USE.NAMES = FALSE)(hw, pos[, 1], pos[, 2])) # TRUE ``` ## Basic replacement form is the same ```r x <- x2 <- "BBCDEF" str_sub(x, 1, 1) <- "A" substr(x2, 1, 1) <- "A" identical(x, x2) # [1] TRUE ``` But here again no default arguments and negative indices don't mean the same. ## Replacing by empty string not supported by `substr<-` ```r str_sub(x,1,3) <- "";x # [1] "DEF" substr(x2,1,3) <- "";x # [1] "ABCDEF" ``` ## dealing with NAs `substr<-` returns error when assigning NA. `str_sub` has an `omit_na` parameter to ignore problematic assignments. ```r x1 <- x2 <- x3 <- x4 <- x1b <- x2b <-"AAA" str_sub(x1, 1, NA) <- "B";x1 substr(x1b, 1, NA) <- "B";x1b identical(x1,x1b) # [1] TRUE str_sub(x2, 1, 2) <- NA;x2 # [1] NA substr(x2b, 1, 2) <- NA;x2b # Error in `substr<-`(`*tmp*`, 1, 2, value = NA) : invalid value str_sub(x3, 1, NA, omit_na = TRUE) <- "B";x3 # [1] "AAA" str_sub(x4, 1, 2, omit_na = TRUE) <- NA;x4 # [1] "AAA" ```