strex (version 2.0.0)

str_extract_numbers: Extract numbers from a string.

Description

Extract the numbers from a string, where decimals, scientific notation and thousand separators are optionally allowed.

Usage

str_extract_numbers(
  string,
  decimals = FALSE,
  leading_decimals = decimals,
  negs = FALSE,
  sci = FALSE,
  big_mark = "",
  leave_as_string = FALSE,
  commas = FALSE
)

Value

For str_extract_numbers and str_extract_non_numerics, a list of numeric or character vectors, one list element for each element of string. For str_nth_number and str_nth_non_numeric, a numeric or character vector the same length as the vector string.

Arguments

string

A string.

decimals

Do you want to include the possibility of decimal numbers (TRUE) or not (FALSE, the default).

leading_decimals

Do you want to allow a leading decimal point to be the start of a number?

negs

Do you want to allow negative numbers? Note that double negatives are not handled here (see the examples).

sci

Make the search aware of scientific notation e.g. 2e3 is the same as 2000.

big_mark

A character. Allow this character to be used as a thousands separator. This character will be removed from between digits before they are converted to numeric. You may specify many at once by pasting them together e.g. big_mark = ",_" will allow both commas and underscores. Internally, this will be used inside a [] regex block so e.g. "a-z" will behave differently to "az-". Most common separators (commas, spaces, underscores) should work fine.

leave_as_string

Do you want to return the number as a string (TRUE) or as numeric (FALSE, the default)?

commas

Deprecated. Use big_mark instead.

Details

If any part of a string contains an ambiguous number (e.g. 1.2.3 would be ambiguous if decimals = TRUE (but not otherwise)), the value returned for that string will be NA and a warning will be issued.

With scientific notation, it is assumed that the exponent is not a decimal number e.g. 2e2.4 is unacceptable. Thousand separators, however, are acceptable in the exponent.

Numbers outside the double precision floating point range (i.e. with absolute value greater than 1.797693e+308) are read as Inf (or -Inf if they begin with a minus sign). This is what base::as.numeric() does.

See Also

Other numeric extractors: str_nth_number_after_mth(), str_nth_number_before_mth(), str_nth_number()

Examples

Run this code
strings <- c(
  "abc123def456", "abc-0.12def.345", "abc.12e4def34.5e9",
  "abc1,100def1,230.5", "abc1,100e3,215def4e1,000"
)
str_extract_numbers(strings)
str_extract_numbers(strings, decimals = TRUE)
str_extract_numbers(strings, decimals = TRUE, leading_decimals = TRUE)
str_extract_numbers(strings, big_mark = ",")
str_extract_numbers(strings,
  decimals = TRUE, leading_decimals = TRUE,
  sci = TRUE
)
str_extract_numbers(strings,
  decimals = TRUE, leading_decimals = TRUE,
  sci = TRUE, big_mark = ",", negs = TRUE
)
str_extract_numbers(strings,
  decimals = TRUE, leading_decimals = FALSE,
  sci = FALSE, big_mark = ",", leave_as_string = TRUE
)
str_extract_numbers(c("22", "1.2.3"), decimals = TRUE)

Run the code above in your browser using DataLab