Use a data dictionary data.frame to apply the following tidying steps to your data.frame:
Remove superfluous columns
Rename columns
Ensure/coerce correct data type for each column
Assign factorial levels, including renaming and grouping
apply_data_dictionary(
data,
data_dictionary,
na_action_default = "keep_NA",
print_coerced_NA = TRUE
)clean data.frame
data.frame to be cleaned
data.frame with the following columns:
old_column_name : character with the old column name
new_data_type : character denoting the tidy data type. Supported types are:
character
integer
float
factor
date
new_column_name : tidy column name. Can be left blank to keep the old column name
coding (factor and date columns only):
factor columns: character denoting old value (key) and new value (value) in a standardised fashion:
key-value pairs are separated from other key-value-pairs by a comma (",")
key and value of the same pair are separated by an equal sign ("=")
quotations around individual keys and values are recommended for clarity, but do not affect functionality.
all values will be coerced to type character, with the exception of "NA" being parsed as type NA
using "default" as a key will assign the specified value to all current values that do not match any of the specified keys, excluding NA
using "NA" as a key will assign the specified value to all current NA values
example coding: "'key1' = 'val1', 'key2' = 'val2', 'default' = 'Other', 'NA' = NA"
if no coding is specified for a column, the coding remains unchanged
date columns: character denoting coding (see format argument in as.Date)
Optional other columns (do not affect behaviour)
character: Specify what to do with NA values. Defaults to 'keep_NA'. Options are:
'keep_NA' NA values remain NA values
'assign_default' NA values are assigned the value specified as 'default'. Requires a 'default' value to be specified Can be overwritten for individal columns by specifying a value for key 'NA'
logical indicating whether a message specifying the location of NAs that are introduced by apply_data_dictionary() to data should be printed.
J. Peter Marquardt