
Last chance! 50% off unlimited learning
Sale ends in
correlation_threshold
returns list of variables such that no two variables have a correlation greater than a specified threshold.
correlation_threshold(variables, sample, cutoff = 0.9, method = "pearson")
character vector specifying observation variables.
tbl containing sample used to estimate parameters.
threshold between [0,1] that defines the minimum correlation of a selected feature.
optional character string specifying method for calculating correlation. This must be one of the strings "pearson"
(default), "kendall"
, "spearman"
.
character vector specifying observation variables to be excluded.
correlation_threshold
is a wrapper for caret::findCorrelation
.
# NOT RUN {
suppressMessages(suppressWarnings(library(magrittr)))
sample <- tibble::tibble(
x = rnorm(30),
y = rnorm(30) / 1000
)
sample %<>% dplyr::mutate(z = x + rnorm(30) / 10)
variables <- c("x", "y", "z")
head(sample)
cor(sample)
# `x` and `z` are highly correlated; one of them will be removed
correlation_threshold(variables, sample)
# }
Run the code above in your browser using DataLab