forcats (version 0.4.0)

fct_lump: Lump together least/most common factor levels into "other"

Description

Lump together least/most common factor levels into "other"

Usage

fct_lump(f, n, prop, w = NULL, other_level = "Other",
  ties.method = c("min", "average", "first", "last", "random", "max"))

fct_lump_min(f, min, w = NULL, other_level = "Other")

Arguments

f

A factor (or character vector).

n, prop

If both n and prop are missing, fct_lump lumps together the least frequent levels into "other", while ensuring that "other" is still the smallest level. It's particularly useful in conjunction with fct_inorder().

Positive n preserves the most common n values. Negative n preserves the least common -n values. It there are ties, you will get at least abs(n) values.

Positive prop preserves values that appear at least prop of the time. Negative prop preserves values that appear at most -prop of the time.

w

An optional numeric vector giving weights for frequency of each value (not level) in f.

other_level

Value of level used for "other" values. Always placed at end of levels.

ties.method

A character string specifying how ties are treated. See rank() for details.

min

Preserves values that appear at least min number of times.

See Also

fct_other() to convert specified levels to other.

Examples

Run this code
# NOT RUN {
x <- factor(rep(LETTERS[1:9], times = c(40, 10, 5, 27, 1, 1, 1, 1, 1)))
x %>% table()
x %>% fct_lump() %>% table()
x %>% fct_lump() %>% fct_inorder() %>% table()

x <- factor(letters[rpois(100, 5)])
x
table(x)
table(fct_lump(x))

# Use positive values to collapse the rarest
fct_lump(x, n = 3)
fct_lump(x, prop = 0.1)

# Use negative values to collapse the most common
fct_lump(x, n = -3)
fct_lump(x, prop = -0.1)

# Use weighted frequencies
w <- c(rep(2, 50), rep(1, 50))
fct_lump(x, n = 5, w = w)

# Use ties.method to control how tied factors are collapsed
fct_lump(x, n = 6)
fct_lump(x, n = 6, ties.method = "max")

x <- factor(letters[rpois(100, 5)])
fct_lump_min(x, min = 10)
# }

Run the code above in your browser using DataCamp Workspace