Learn R Programming

nuggets (version 2.0.1)

remove_almost_constant: Remove almost constant columns from a data frame

Description

Test all columns specified by .what and remove those that are almost constant. A column is considered almost constant if the proportion of its most frequent value is greater than or equal to the threshold specified by .threshold. See is_almost_constant() for further details.

Usage

remove_almost_constant(
  .data,
  .what = everything(),
  ...,
  .threshold = 1,
  .na_rm = FALSE,
  .verbose = FALSE
)

Value

A data frame with all selected columns removed that meet the definition of being almost constant.

Arguments

.data

A data frame.

.what

A tidyselect expression (see tidyselect syntax) specifying the columns to process.

...

Additional tidyselect expressions selecting more columns.

.threshold

Numeric scalar in the interval \([0,1]\) giving the minimum required proportion of the most frequent value for a column to be considered almost constant.

.na_rm

Logical; if TRUE, NA values are removed before computing proportions. If FALSE, NA is treated as a regular value. See is_almost_constant() for details.

.verbose

Logical; if TRUE, print a message listing the removed columns.

Author

Michal Burda

See Also

is_almost_constant(), remove_ill_conditions()

Examples

Run this code
d <- data.frame(a1 = 1:10,
                a2 = c(1:9, NA),
                b1 = "b",
                b2 = NA,
                c1 = rep(c(TRUE, FALSE), 5),
                c2 = rep(c(TRUE, NA), 5),
                d  = c(rep(TRUE, 4), rep(FALSE, 4), NA, NA))

# Remove columns that are constant (threshold = 1)
remove_almost_constant(d, .threshold = 1.0, .na_rm = FALSE)
remove_almost_constant(d, .threshold = 1.0, .na_rm = TRUE)

# Remove columns where the majority value occurs in >= 50% of rows
remove_almost_constant(d, .threshold = 0.5, .na_rm = FALSE)
remove_almost_constant(d, .threshold = 0.5, .na_rm = TRUE)

# Restrict check to a subset of columns
remove_almost_constant(d, a1:b2, .threshold = 0.5, .na_rm = TRUE)

Run the code above in your browser using DataLab