This function lets the user automatically transform a dataframe with categorical columns into numerical by one hot encoding technic.
ohse(df, redundant = FALSE, drops = TRUE, ignore = NA,
dates = FALSE, holidays = FALSE, country = "Colombia",
currency_pair = NA, trim = 0, limit = 10, variance = 0.9,
other_label = "OTHER", sep = "_", summary = TRUE)
Dataframe
Boolean. Should we keep redundat columns? i.e. If the column only has two different values, should we keep both new columns?
Boolean. Drop automatically some useless features?
Vector or character. Which column should be ignored?
Boolean. Do you want the function to create more features out of the date/time columns?
Boolean. Include holidays as new columns?
Character or vector. For which countries should the holidays be included?
Character. Which currency exchange do you wish to get the history from? i.e, USD/COP, EUR/USD...
Integer. Trim names until the nth character
Integer. Limit one hot encoding to the n most frequent values of each column
Numeric. Drop columns with more than n variance. Range: 0-1. For example: if a variable contains 91 unique different values out of 100 observations, this column will be suppressed if value is set to 0.9
Character. With which text do you wish to replace the filtered values with?
Character. Separator's string
Boolean. Print a summary of the operations?
Other Data Wrangling: balance_data
,
calibrate
, categ_reducer
,
cleanText
, date_feats
,
dateformat
, formatNum
,
formatTime
, holidays
,
impute
, left
,
normalize
, numericalonly
,
one_hot_encoding_commas
,
rbind_full
, removenacols
,
removenarows
, replaceall
,
right
, textFeats
,
textTokenizer
, vector2text
,
year_month
, year_week
Other Feature Engineering: date_feats
,
holidays