powered by
This function lets the user reduce categorical values in a vector. It is tidyverse friendly for use on pipelines
categ_reducer( df, ..., nmin = 0, pmin = 0, pcummax = 100, top = NA, other_label = "other" )
Categorical Vector
Variables. Which variable do you wish to reduce?
Integer. Number of minimum times a value is repeated
Numerical. Porcentage of minimum times a value is repeated
Numerical. Top cumulative porcentage of most repeated values
Integer. Keep the n most frequently repeated values
Character. With which text do you wish to replace the filtered values with?
Other Data Wrangling: balance_data(), cleanText(), date_cuts(), date_feats(), dateformat(), formatNum(), formatTime(), holidays(), impute(), left(), normalize(), numericalonly(), ohe_commas(), ohse(), rbind_full(), removenacols(), removenarows(), replaceall(), right(), textFeats(), textTokenizer(), vector2text(), year_month(), year_week()
balance_data()
cleanText()
date_cuts()
date_feats()
dateformat()
formatNum()
formatTime()
holidays()
impute()
left()
normalize()
numericalonly()
ohe_commas()
ohse()
rbind_full()
removenacols()
removenarows()
replaceall()
right()
textFeats()
textTokenizer()
vector2text()
year_month()
year_week()
# NOT RUN { data(dft) # Titanic dataset categ_reducer(dft, Embarked, top = 2) %>% freqs(Embarked) categ_reducer(dft, Ticket, nmin = 7, other_label = "Other Ticket") %>% freqs(Ticket) # }
Run the code above in your browser using DataLab