Learn R Programming

collinear (version 2.0.0)

identify_predictors_zero_variance: Identify Zero and Near-Zero Variance Predictors

Description

Variables with a variance of zero or near-zero are highly problematic for multicollinearity analysis and modelling in general. This function identifies these variables with a level of sensitivity defined by the 'decimals' argument. Smaller number of decimals increase the number of variables detected as near zero variance. Recommended values will depend on the range of the numeric variables in 'df'.

Usage

identify_predictors_zero_variance(df = NULL, predictors = NULL, decimals = 4)

Value

character vector: names of zero and near-zero variance columns.

Arguments

df

(required; data frame, tibble, or sf) A data frame with responses and predictors. Default: NULL.

predictors

(optional; character vector) Names of the predictors to select from df. If omitted, all numeric columns in df are used instead. If argument response is not provided, non-numeric variables are ignored. Default: NULL

decimals

(required, integer) Number of decimal places for the zero variance test. Smaller numbers will increase the number of variables detected as near-zero variance. Recommended values will depend on the range of the numeric variables in 'df'. Default: 4

Author

Blas M. Benito, PhD

See Also

Other data_types: identify_predictors(), identify_predictors_categorical(), identify_predictors_numeric(), identify_predictors_type(), identify_response_type()

Examples

Run this code

data(
  vi,
  vi_predictors
)

#create zero variance predictors
vi$zv_1 <- 1
vi$zv_2 <- runif(n = nrow(vi), min = 0, max = 0.0001)


#add to vi predictors
vi_predictors <- c(
  vi_predictors,
  "zv_1",
  "zv_2"
)

#identify zero variance predictors
zero.variance.predictors <- identify_predictors_zero_variance(
  df = vi,
  predictors = vi_predictors
)

zero.variance.predictors

Run the code above in your browser using DataLab