Unlimited learning, half price | 50% off

Last chance! 50% off unlimited learning

Sale ends in


cytominer (version 0.2.2)

correlation_threshold: Remove redundant variables.

Description

correlation_threshold returns list of variables such that no two variables have a correlation greater than a specified threshold.

Usage

correlation_threshold(variables, sample, cutoff = 0.9, method = "pearson")

Arguments

variables

character vector specifying observation variables.

sample

tbl containing sample used to estimate parameters.

cutoff

threshold between [0,1] that defines the minimum correlation of a selected feature.

method

optional character string specifying method for calculating correlation. This must be one of the strings "pearson" (default), "kendall", "spearman".

Value

character vector specifying observation variables to be excluded.

Details

correlation_threshold is a wrapper for caret::findCorrelation.

Examples

Run this code
# NOT RUN {
suppressMessages(suppressWarnings(library(magrittr)))
sample <- tibble::tibble(
  x = rnorm(30),
  y = rnorm(30) / 1000
)

sample %<>% dplyr::mutate(z = x + rnorm(30) / 10)
variables <- c("x", "y", "z")

head(sample)
cor(sample)

# `x` and `z` are highly correlated; one of them will be removed

correlation_threshold(variables, sample)
# }

Run the code above in your browser using DataLab