knitr::opts_chunk$set(echo = TRUE, comment = "#>")
First let's load the library:
The n^th^ number in a string
I often want to get the first, last or n^th^ number in a string.
pop <- "A population of 1000 comprised of 488 dogs and 512 cats." first_number(pop) nth_number(pop, 2) last_number(pop)
All the numbers in a string
All the non-numbers in a string
Split strings by numbers
Could that be interpreted as numeric?
Sometimes we don't want to know is something is numeric, we want to know if it could be considered to be numeric (or could be coerced to numeric). For this, there's
is.numeric(23) is.numeric("23") can_be_numeric(23) can_be_numeric("23") can_be_numeric("23a")
Get the n^th^ element of a string
str_elem("abc", 2) str_elem("abcdefz", -1) # last element
Trim anything (not just whitespace)
str_trim just trims whitespace. What if you want to trim something else? Now you can
Count the number of matches of a pattern in a string
count_matches(pop, " ") # count the spaces in pop count_matches("Bob and Joe went to see Bob's mother.", "Bob")
Turn duplicates of a pattern into singles
Suppose we want to remove double spacing:
double__spaced <- "Hello world, pretend it's Saturday :-)" count_matches(double__spaced, " ") # count the spaces single_spaced <- singleize(double__spaced, pattern = " ") single_spaced count_matches(single_spaced, " ") # half the spaces are gone
The bit of a string after the n^th^ appearance of a pattern
Suppose we have sentences telling us about a couple of boxes:
box_infos <- c("Box 1 has weight 23kg and volume 0.3 cubic metres.", "Box 2 has weight 20kg and volume 0.33 cubic metres.")
We can get (for example) the weights of the boxes by taking the first number that appears after the word "weight".
library(magrittr) str_after_nth(box_infos, "weight", 1) # the bit of the string after 1st "weight" str_after_nth(box_infos, "weight", 1) %>% nth_number(1) # 1st number after 1st "weight"
We'd like to put all of the box information into a nice data frame. Here's how.
tibble::tibble(box = nth_number(box_infos, 1), weight = str_after_nth(box_infos, "weight", 1) %>% nth_number(1, decimals = TRUE), volume = str_after_nth(box_infos, "volume", 1) %>% nth_number(1, decimals = TRUE) )
Split camel case
Sometimes people use camel case (CamelCase) to avoid using spaces. What if we want to put the spaces back in?
camel_names <- c("JoeBloggs", "JaneyMac") str_split_camel_case(camel_names)