Learn R Programming

amp.dm (version 0.2.0)

num_lump: Perform lumping of numerical values

Description

This function is mainly a wrapper for forcats::fct_lump but applied on numeric variables. Furthermore there is the option to use uniques to determine small categories for instance on individual level

Usage

num_lump(x, lumpcat = 99, uniques = NULL, prop = NULL, min = NULL, ...)

Value

vector with the lumping applied

Arguments

x

numeric vector with the items that should be lumped

lumpcat

the category in which the lumped levels should be added (see details)

uniques

vector that defines unique records to enable lumping on non duplicate values

prop

numeric with the threshold proportions for lumping

min

numeric with the min number of times a level should appear to not lump

...

additional arguments passed to forcats::fct_lump_min and/or forcats::fct_lump_prop

Author

Richard Hooijmaijers

Details

The argument lumpcat is the level in which lumped values should appear and can be one of the following:

  • numeric with the category number to set the levels to

  • character specifying "largest" to select the largest category (selected before lumping)

  • named vector to set the 'algorithm' for instance: c('5'='3', '4'='6') to set category 5 to 3 and 4 to 6 when these categories need lumping

Examples

Run this code

dfrm <- data.frame(id = 1:30, cat = c(rep(1,8),rep(2,13), rep(3,4),rep(4,5)))
num_lump(x=dfrm$cat, lumpcat=99, prop=0.15)

Run the code above in your browser using DataLab