caret (version 5.10-13)

findCorrelation: Determine highly correlated variables

Description

This function searches through a correlation matrix and returns a vector of integers corresponding to columns to remove to reduce pair-wise correlations.

Usage

findCorrelation(x, cutoff = .90, verbose = FALSE)

Arguments

x
A correlation matrix
cutoff
A numeric value for the pariwise absolute correlation cutoff
verbose
A boolean for printing the details

Value

  • A vector of indices denoting the columns to remove. If no correlations meet the criteria, numeric(0) is returned.

Details

The absolute values of pair-wise correlations are considered. If two variables have a high correlation, the function looks at the mean absolute correlation of each variable and removes the variable with the largest mean absolute correlation.

There are several function in the subselect package (leaps, genetic, anneal) that can also be used to accomplish the same goal.

See Also

leaps, genetic, anneal, findLinearCombos

Examples

Run this code
corrMatrix <- diag(rep(1, 5))
corrMatrix[2, 3] <- corrMatrix[3, 2] <- .7
corrMatrix[5, 3] <- corrMatrix[3, 5] <- -.7
corrMatrix[4, 1] <- corrMatrix[1, 4] <- -.67

corrDF <- expand.grid(row = 1:5, col = 1:5)
corrDF$correlation <- as.vector(corrMatrix)
levelplot(correlation ~ row+ col, corrDF)

findCorrelation(corrMatrix, cutoff = .65, verbose = TRUE)

findCorrelation(corrMatrix, cutoff = .99, verbose = TRUE)

removeCols <- findCorrelation(corrMatrix, cutoff = .65, verbose = FALSE)
   if(!isTRUE(all.equal(corrMatrix[-removeCols, -removeCols], diag(rep(1, 3))))) stop("test 1 failed")
   if(!isTRUE(all.equal( findCorrelation(corrMatrix, .99, verbose = FALSE), numeric(0)))) stop("test 2 failed")

Run the code above in your browser using DataCamp Workspace