Last chance! 50% off unlimited learning
Sale ends in
This function lets the user reduce categorical values in a vector. It is tidyverse friendly for use on pipelines
categ_reducer(
df,
var,
nmin = 0,
pmin = 0,
pcummax = 100,
top = NA,
pvalue_max = 1,
cor_var = "tag",
limit = 20,
other_label = "other",
...
)
data.frame df
on which var
has been transformed
Categorical Vector
Variable. Which variable do you wish to reduce?
Integer. Number of minimum times a value is repeated
Numerical. Percentage of minimum times a value is repeated
Numerical. Top cumulative percentage of most repeated values
Integer. Keep the n most frequently repeated values
Numeric (0-1]. Max pvalue categories
Character. If pvalue_max < 1, you must define which column name will be compared with (numerical or binary).
Integer. Limit one hot encoding to the n most frequent
values of each column. Set to NA
to ignore argument.
Character. With which text do you wish to replace the filtered values with?
Additional parameters
Other Data Wrangling:
balance_data()
,
cleanText()
,
date_cuts()
,
date_feats()
,
formatNum()
,
holidays()
,
impute()
,
left()
,
normalize()
,
ohe_commas()
,
ohse()
,
removenacols()
,
replaceall()
,
textFeats()
,
textTokenizer()
,
vector2text()
,
year_month()
data(dft) # Titanic dataset
categ_reducer(dft, Embarked, top = 2) %>% freqs(Embarked)
categ_reducer(dft, Ticket, nmin = 7, other_label = "Other Ticket") %>% freqs(Ticket)
categ_reducer(dft, Ticket, pvalue_max = 0.05, cor_var = "Survived") %>% freqs(Ticket)
Run the code above in your browser using DataLab