utils_num_str: Utilities for handling with numbers and strings

Description

all_lower_case(): Translate all non-numeric strings of a data frame to lower case ( "Env" to "env").
all_upper_case(): Translate all non-numeric strings of a data frame to upper case (e.g., "Env" to "ENV").
all_title_case(): Translate all non-numeric strings of a data frame to title case (e.g., "ENV" to "Env").
extract_number(): Extract the number(s) of a string.
extract_string(): Extract all strings, ignoring case.
find_text_in_num(): Find text characters in a numeric sequence and return the row index.
has_text_in_num(): Inspect columns looking for text in numeric sequence and return a warning if text is found.
remove_space(): Remove all blank spaces of a string.
remove_strings(): Remove all strings of a variable.
replace_number(): Replace numbers with a replacement.
replace_string(): Replace all strings with a replacement, ignoring case.
round_cols(): Round a selected column or a whole data frame to significant figures.
tidy_strings(): Tidy up characters strings, non-numeric columns, or any selected columns in a data frame by putting all word in upper case, replacing any space, tabulation, punctuation characters by '_', and putting '_' between lower and upper case. Suppose that str = c("Env1", "env 1", "env.1") (which by definition should represent a unique level in plant breeding trials, e.g., environment 1) is subjected to tidy_strings(str): the result will be then c("ENV_1", "ENV_1", "ENV_1"). See Examples section for more examples.

Usage

all_upper_case(.data, ...)
all_lower_case(.data, ...)
all_title_case(.data, ...)
extract_number(
  .data,
  var,
  new_var = new_var,
  drop = FALSE,
  pull = FALSE,
  .before = NULL,
  .after = NULL
)
extract_string(
  .data,
  var,
  new_var = new_var,
  drop = FALSE,
  pull = FALSE,
  .before = NULL,
  .after = NULL
)
find_text_in_num(.data, ...)
has_text_in_num(.data)
remove_space(.data, ...)
remove_strings(.data, ...)
replace_number(
  .data,
  var,
  new_var = new_var,
  pattern = NULL,
  replacement = "",
  drop = FALSE,
  pull = FALSE,
  .before = NULL,
  .after = NULL
)
replace_string(
  .data,
  var,
  new_var = new_var,
  pattern = NULL,
  replacement = "",
  ignore_case = FALSE,
  drop = FALSE,
  pull = FALSE,
  .before = NULL,
  .after = NULL
)
round_cols(.data, ..., digits = 2)
tidy_strings(.data, ..., sep = "_")

Arguments

.data

A data frame

...

The argument depends on the function used.

For round_cols() ... are the variables to round. If no variable is informed, all the numeric variables from data are used.
For all_lower_case(), all_upper_case(), all_title_case(), remove_strings(), and tidy_strings() ... are the variables to apply the function. If no variable is informed, the function will be applied to all non-numeric variables in .data.

var

The variable to extract or replace numbers or strings.

new_var

The name of the new variable containing the numbers or strings extracted or replaced. Defaults to new_var.

drop

Logical argument. If TRUE keeps the new variable new_var and drops the existing ones. Defaults to FALSE.

pull

Logical argument. If TRUE, returns the last column (on the assumption that's the column you've created most recently), as a vector.

.before, .after

For replace_sting(), replace_number(), extract_string(), ,and extract_number() one-based column index or column name where to add the new columns.

pattern

A string to be matched. Regular Expression Syntax is also allowed.

replacement

A string for replacement.

ignore_case

If FALSE (default), the pattern matching is case sensitive and if TRUE, case is ignored during matching.

digits

The number of significant figures.

sep

A character string to separate the terms. Defaults to "_".

Examples

Run this code

# NOT RUN {
library(metan)

################ Rounding numbers ###############
# All numeric columns
round_cols(data_ge2, digits = 1)

# Round specific columns
round_cols(data_ge2, EP, digits = 1)

########### Extract or replace numbers ##########
# Extract numbers
extract_number(data_ge, GEN)
extract_number(data_ge,
               var = GEN,
               drop = TRUE,
               new_var = g_number)

# Replace numbers

replace_number(data_ge, GEN)
replace_number(data_ge,
               var = GEN,
               pattern = "1",
               replacement = "_one",
               pull = TRUE)

########## Extract, replace or remove strings ##########
# Extract strings
extract_string(data_ge, GEN)
extract_string(data_ge,
               var = GEN,
               drop = TRUE,
               new_var = g_name)

# Replace strings
replace_string(data_ge, GEN)
replace_string(data_ge,
               var = GEN,
               new_var = GENOTYPE,
               pattern = "G",
               replacement = "GENOTYPE_")

# Remove strings
remove_strings(data_ge)
remove_strings(data_ge, ENV)


############ Find text in numeric sequences ###########
mixed_text <- data.frame(data_ge)
mixed_text[2, 4] <- "2..503"
mixed_text[3, 4] <- "3.2o75"
find_text_in_num(mixed_text, GY)

############# upper, lower and title cases ############
gen_text <- c("GEN 1", "Gen 1", "gen 1")
all_lower_case(gen_text)
all_upper_case(gen_text)
all_title_case(gen_text)

# A whole data frame
all_lower_case(data_ge)


############### Tidy up messy text string ##############
messy_env <- c("ENV 1", "Env   1", "Env1", "env1", "Env.1", "Env_1")
tidy_strings(messy_env)

messy_gen <- c("GEN1", "gen 2", "Gen.3", "gen-4", "Gen_5", "GEN_6")
tidy_strings(messy_gen)

messy_int <- c("EnvGen", "Env_Gen", "env gen", "Env Gen", "ENV.GEN", "ENV_GEN")
tidy_strings(messy_int)

library(tibble)
# Or a whole data frame
df <- tibble(Env = messy_env,
             gen = messy_gen,
             Env_GEN = interaction(Env, gen),
             y = rnorm(6, 300, 10))
df
tidy_strings(df)
# }

Run the code above in your browser using DataLab