Learn R Programming

eCerto (version 0.8.5)

assert_col: Assert a specific column (type and position) in a data frame.

Description

assert_col will check in a data.frame for name, position, type of a specific column and ensure that the return value (data frame) contains a respective column. If possible, the current values are converted into the specified type.

Usage

assert_col(
  df,
  name,
  pos = NULL,
  type = c("character", "integer", "numeric", "factor", "logical", "Date"),
  fuzzy_name = TRUE,
  default_value = NULL
)

Value

A data frame with a column of the specified name and type at the specified position. An error message is attached to the result as an attribute in case of unexpected events.

Arguments

df

Input data frame.

name

Name of the column to ensure (and to search for).

pos

Position of this column. NULL to keep position where found in df.

type

Desired data type of this column.

fuzzy_name

Allow fuzzy matching (additional blanks and case insensitive search allowed).

default_value

Default value if column needs to be created or can not be converted to specified type. Keep NULL to use pre defined default values.

Details

tbd.

Examples

Run this code
x <- data.frame(
  "analyte" = c("A", "B"),
  "tmp" = rep(0L, 2),
  "unit" = c("x", "y")
)
str(x)
ac <- eCerto::assert_col
str(ac(df = x, name = "analyte", pos = 1, type = "factor"))
str(ac(df = x, name = "Analyte", pos = 3, type = "character"))
str(ac(df = x, name = " Analyte", pos = 2, type = "factor"))
str(ac(df = x, name = "Analyte", pos = 2, type = "factor", fuzzy_name = FALSE))
str(ac(df = x, name = "test", type = "factor", default_value = "test"))
# this will lead to NAs in column unit because the conversion does not lead to an error
# hence the default value is not used
str(ac(df = x, name = "unit", type = "numeric", default_value = 10))
# this will lead to the specified default data in column unit because the
# conversion attempt does lead to an error
str(ac(df = x, name = "unit", type = "Date"))
str(ac(df = data.frame("test" = "2022-03-31"), name = "test", type = "Date"))

# show type and class of internal default values
x <- data.frame(
  "character" = "", "integer" = 0L, "numeric" = 0, "factor" = factor(NA),
  "logical" = NA, "date" = Sys.Date(), NA
)
sapply(1:ncol(x), function(i) {
  typeof(x[, i])
})
sapply(1:ncol(x), function(i) {
  class(x[, i])
})

Run the code above in your browser using DataLab