Last chance! 50% off unlimited learning
Sale ends in
This function lets the user automatically transform a dataframe with categorical columns into numerical by one hot encoding technic.
ohse(
df,
redundant = FALSE,
drops = TRUE,
ignore = NA,
dates = FALSE,
holidays = FALSE,
country = "Colombia",
currency_pair = NA,
trim = 0,
limit = 10,
variance = 0.9,
other_label = "OTHER",
sep = "_",
quiet = FALSE,
...
)
Dataframe
Boolean. Should we keep redundat columns? i.e. If the column only has two different values, should we keep both new columns?
Boolean. Drop automatically some useless features?
Vector or character. Which column should be ignored?
Boolean. Do you want the function to create more features out of the date/time columns?
Boolean. Include holidays as new columns?
Character or vector. For which countries should the holidays be included?
Character. Which currency exchange do you wish to get the history from? i.e, USD/COP, EUR/USD...
Integer. Trim names until the nth character
Integer. Limit one hot encoding to the n most frequent
values of each column. Set to NA
to ignore argument.
Numeric. Drop columns with more than n variance. Range: 0-1. For example: if a variable contains 91 unique different values out of 100 observations, this column will be suppressed if value is set to 0.9
Character. With which text do you wish to replace the filtered values with?
Character. Separator's string
Boolean. Quiet all messages and summaries?
Additional parameters
data.frame on which all features are numerical by nature or transformed with one hot encoding.
Other Data Wrangling:
balance_data()
,
categ_reducer()
,
cleanText()
,
date_cuts()
,
date_feats()
,
formatNum()
,
holidays()
,
impute()
,
left()
,
normalize()
,
numericalonly()
,
ohe_commas()
,
removenacols()
,
removenarows()
,
replaceall()
,
textFeats()
,
textTokenizer()
,
vector2text()
,
year_month()
,
year_week()
Other Feature Engineering:
date_feats()
,
holidays()
Other One Hot Encoding:
date_feats()
,
holidays()
,
ohe_commas()
# NOT RUN {
data(dft)
dft <- dft[,c(2,3,5,9,11)]
ohse(dft, limit = 3) %>% head(3)
ohse(dft, limit = 3, redundant = TRUE) %>% head(3)
# Getting rid of columns with no (or too much) variance
dft$no_variance1 <- 0
dft$no_variance2 <- c("A", rep("B", nrow(dft) - 1))
dft$no_variance3 <- as.character(rnorm(nrow(dft)))
dft$no_variance4 <- c(rep("A", 20), round(rnorm(nrow(dft) - 20), 4))
ohse(dft, limit = 3) %>% head(3)
ohse(dft, limit = 3, var = 1) %>% head(3)
# }
Run the code above in your browser using DataLab