Learn R Programming

DataExplorer (version 0.2.4)

CollapseCategory: Collapse categories for discrete features

Description

Sometimes discrete features have sparse categories. This function will collapse the sparse categories for a discrete feature based on a given threshold.

Usage

CollapseCategory(data, feature, threshold, update = FALSE)

Arguments

data
input data, in either data.frame or data.table format.
feature
name of the discrete feature to be collapsed.
threshold
the bottom x% categories to be collapsed, e.g., if set to 20%, categories with cumulative frequency of the bottom 20% will be collapsed.
update
logical, indicating if the data should be modified. Setting to TRUE will modify the input data without returning anything. The default is FALSE.

Value

  • if update is set to FALSE, returns a data.table object containing categories with cumulative frequency less than the input threshold.

Details

If a continuous feature is passed to the argument feature, it will be force set to character-class.

Examples

Run this code
# load packages
library(data.table)

# generate data
data <- data.table("a" = as.factor(round(rnorm(500, 10, 5))))

# view cumulative frequency without collpasing categories
CollapseCategory(data, "a", 0.2)

# collapse bottom 20\\\% categories based on cumulative frequency
CollapseCategory(data, "a", 0.2, update = TRUE)
BarDiscrete(data)

Run the code above in your browser using DataLab