Learn R Programming

DataExplorer (version 0.2.6)

CollapseCategory: Collapse categories for discrete features

Description

Sometimes discrete features have sparse categories. This function will collapse the sparse categories for a discrete feature based on a given threshold.

Usage

CollapseCategory(data, feature, threshold, update = FALSE)

Arguments

data
input data, in either data.frame or data.table format.
feature
name of the discrete feature to be collapsed.
threshold
the bottom x% categories to be collapsed, e.g., if set to 20%, categories with cumulative frequency of the bottom 20% will be collapsed.
update
logical, indicating if the data should be modified. Setting to TRUE will modify the input data without returning anything. The default is FALSE.

Value

if update is set to FALSE, returns a data.table object containing categories with cumulative frequency less than the input threshold.

Details

If a continuous feature is passed to the argument feature, it will be force set to character-class.

Examples

Run this code
# load packages
library(data.table)

# generate data
data <- data.table("a" = as.factor(round(rnorm(500, 10, 5))))

# view cumulative frequency without collpasing categories
CollapseCategory(data, "a", 0.2)

# collapse bottom 20\% categories based on cumulative frequency
CollapseCategory(data, "a", 0.2, update = TRUE)
BarDiscrete(data)

Run the code above in your browser using DataLab