categ_reducer: Reduce categorical values

Description

This function lets the user reduce categorical values in a vector. It is tidyverse friendly for use on pipelines

Usage

categ_reducer(
  df,
  var,
  nmin = 0,
  pmin = 0,
  pcummax = 100,
  top = NA,
  pvalue_max = 1,
  cor_var = "tag",
  limit = 20,
  other_label = "other",
  ...
)

Value

data.frame df on which var has been transformed

Arguments

df: Categorical Vector
var: Variable. Which variable do you wish to reduce?
nmin: Integer. Number of minimum times a value is repeated
pmin: Numerical. Percentage of minimum times a value is repeated
pcummax: Numerical. Top cumulative percentage of most repeated values
top: Integer. Keep the n most frequently repeated values
pvalue_max: Numeric (0-1]. Max pvalue categories
cor_var: Character. If pvalue_max < 1, you must define which column name will be compared with (numerical or binary).
limit: Integer. Limit one hot encoding to the n most frequent values of each column. Set to NA to ignore argument.
other_label: Character. With which text do you wish to replace the filtered values with?
...: Additional parameters

Examples

Run this code

data(dft) # Titanic dataset
categ_reducer(dft, Embarked, top = 2) %>% freqs(Embarked)
categ_reducer(dft, Ticket, nmin = 7, other_label = "Other Ticket") %>% freqs(Ticket)
categ_reducer(dft, Ticket, pvalue_max = 0.05, cor_var = "Survived") %>% freqs(Ticket)

Run the code above in your browser using DataLab

Description

Usage

Value

Arguments

See Also

Examples