Learn R Programming

lares (version 4.8.4)

categ_reducer: Reduce categorical values

Description

This function lets the user reduce categorical values in a vector. It is tidyverse friendly for use on pipelines

Usage

categ_reducer(
  df,
  ...,
  nmin = 0,
  pmin = 0,
  pcummax = 100,
  top = NA,
  other_label = "other"
)

Arguments

df

Categorical Vector

...

Variables. Which variable do you wish to reduce?

nmin

Integer. Number of minimum times a value is repeated

pmin

Numerical. Porcentage of minimum times a value is repeated

pcummax

Numerical. Top cumulative porcentage of most repeated values

top

Integer. Keep the n most frequently repeated values

other_label

Character. With which text do you wish to replace the filtered values with?

See Also

Other Data Wrangling: balance_data(), cleanText(), date_cuts(), date_feats(), dateformat(), formatNum(), formatTime(), holidays(), impute(), left(), normalize(), numericalonly(), ohe_commas(), ohse(), rbind_full(), removenacols(), removenarows(), replaceall(), right(), textFeats(), textTokenizer(), vector2text(), year_month(), year_week()

Examples

Run this code
# NOT RUN {
data(dft) # Titanic dataset
categ_reducer(dft, Embarked, top = 2) %>% freqs(Embarked)
categ_reducer(dft, Ticket, nmin = 7, other_label = "Other Ticket") %>% freqs(Ticket)
# }

Run the code above in your browser using DataLab