Transforms a transactional table into an id aggregated table with custom options for aggregation methods for numeric and categorical columns.
preprocess(
df,
samplesize = NA,
numeric_operation_list = c("mean"),
categories = NULL,
target = NA,
target_agg = "mean",
verbose = TRUE
)
data.frame, the data to preprocess
numeric, the fraction of ids used to create a sub-sample of the input df
list, a list of the aggregation functions to apply to numeric columns
list, a list of the categorical columns to aggregate
character, the column to use as a response variable for supervised learning
character, the aggregation function to use to aggregate the target column
logical whether information about the preprocessing should be given
An id attributes data frame, e.g. customer attributes if the id represents customer IDs. A single row per unique id.