findCorrelation

0th

Percentile

Determine highly correlated variables

This function searches through a correlation matrix and returns a vector of integers corresponding to columns to remove to reduce pair-wise correlations.

Keywords
manip
Usage
findCorrelation(x, cutoff = .90, verbose = FALSE)
Arguments
x
A correlation matrix
cutoff
A numeric value for the pariwise absolute correlation cutoff
verbose
A boolean for printing the details
Details

The absolute values of pair-wise correlations are considered. If two variables have a high correlation, the function looks at the mean absolute correlation of each variable and removes the variable with the largest mean absolute correlation.

Value

  • A vector of indices denoting the columns to remove. If no correlations meet the criteria, numeric(0) is returned.

Aliases
  • findCorrelation
Examples
corrMatrix <- diag(rep(1, 5))
corrMatrix[2, 3] <- corrMatrix[3, 2] <- .7
corrMatrix[5, 3] <- corrMatrix[3, 5] <- -.7
corrMatrix[4, 1] <- corrMatrix[1, 4] <- -.67

corrDF <- expand.grid(row = 1:5, col = 1:5)
corrDF$correlation <- as.vector(corrMatrix)
levelplot(correlation ~ row+ col, corrDF)

findCorrelation(corrMatrix, cutoff = .65, verbose = TRUE)

findCorrelation(corrMatrix, cutoff = .99, verbose = TRUE)

removeCols <- findCorrelation(corrMatrix, cutoff = .65, verbose = FALSE)
   if(!isTRUE(all.equal(corrMatrix[-removeCols, -removeCols], diag(rep(1, 3))))) stop("test 1 failed")
   if(!isTRUE(all.equal( findCorrelation(corrMatrix, .99, verbose = FALSE), numeric(0)))) stop("test 2 failed")
Documentation reproduced from package caret, version 4.25, License: GPL-2

Community examples

soymari125 at Jul 3, 2018 caret v6.0-80

# *Another example* # calculate correlation matrix > correlationMatrix <- cor(data4[,3:18]) > dim(correlationMatrix) [1] 16 16 > # summarize the correlation matrix > # find attributes that are highly corrected (ideally >0.75) > print(correlationMatrix) basal kurt cl th pk ra basal 1.000000000 -0.047181714 0.431029995 0.3729320332 -0.01265766 0.373203406 kurt -0.047181714 1.000000000 -0.014940999 0.1216412366 0.11793816 0.120720327 cl 0.431029995 -0.014940999 1.000000000 0.9576454368 0.10334894 0.956572659 th 0.372932033 0.121641237 0.957645437 1.0000000000 0.07389917 0.998605988 pk -0.012657655 0.117938158 0.103348945 0.0738991747 1.00000000 0.073040802 ra 0.373203406 0.120720327 0.956572659 0.9986059878 0.07304080 1.000000000 ne 0.377067319 0.089514020 0.921709799 0.9438973869 0.06957701 0.944794152 zc 0.151460899 -0.025275545 0.351339622 0.2540941745 0.72209923 0.246130636 sbi 0.021139002 0.151290286 0.108751493 0.1599702995 0.06275704 0.158537149 spi -0.007213781 -0.081052683 -0.041535791 -0.0732224698 0.00877137 -0.073350844 spr -0.003240699 -0.035526419 -0.040623858 -0.0628132859 -0.02007257 -0.063255435 sc 0.022524454 0.231214505 0.264454041 0.3883356437 0.25904458 0.385160694 smad -0.003918369 -0.003893825 0.002837923 -0.0008341529 0.01591010 -0.000796614 ssd -0.011171919 -0.019674459 -0.045167014 -0.0742436729 0.02843364 -0.074679163 scr 0.022524454 0.231214505 0.264454041 0.3883356437 0.25904458 0.385160694 sf 0.002199715 0.232919191 0.102390876 0.1857397940 0.10900790 0.184206491 ne zc sbi spi spr sc basal 0.37706732 0.151460899 0.021139002 -0.007213781 -0.003240699 0.022524454 kurt 0.08951402 -0.025275545 0.151290286 -0.081052683 -0.035526419 0.231214505 cl 0.92170980 0.351339622 0.108751493 -0.041535791 -0.040623858 0.264454041 th 0.94389739 0.254094175 0.159970299 -0.073222470 -0.062813286 0.388335644 pk 0.06957701 0.722099233 0.062757041 0.008771370 -0.020072568 0.259044576 ra 0.94479415 0.246130636 0.158537149 -0.073350844 -0.063255435 0.385160694 ne 1.00000000 0.196797081 0.107280715 -0.069163325 -0.052599448 0.285765198 zc 0.19679708 1.000000000 0.080471076 0.011499973 -0.006075214 0.179013205 sbi 0.10728071 0.080471076 1.000000000 -0.192516195 -0.004037006 0.424772641 spi -0.06916333 0.011499973 -0.192516195 1.000000000 0.305876755 -0.063102361 spr -0.05259945 -0.006075214 -0.004037006 0.305876755 1.000000000 -0.085859834 sc 0.28576520 0.179013205 0.424772641 -0.063102361 -0.085859834 1.000000000 smad 0.00331882 0.001497509 -0.017840083 0.015031269 -0.010531983 -0.004939186 ssd -0.08686502 0.060474407 0.233744239 0.293263846 0.230609514 0.014486054 scr 0.28576520 0.179013205 0.424772641 -0.063102361 -0.085859834 1.000000000 sf 0.12856858 0.077335109 0.858225089 -0.202560755 -0.086086306 0.532576231 smad ssd scr sf basal -0.0039183695 -0.011171919 0.022524454 0.002199715 kurt -0.0038938248 -0.019674459 0.231214505 0.232919191 cl 0.0028379226 -0.045167014 0.264454041 0.102390876 th -0.0008341529 -0.074243673 0.388335644 0.185739794 pk 0.0159101028 0.028433644 0.259044576 0.109007897 ra -0.0007966140 -0.074679163 0.385160694 0.184206491 ne 0.0033188199 -0.086865024 0.285765198 0.128568582 zc 0.0014975095 0.060474407 0.179013205 0.077335109 sbi -0.0178400832 0.233744239 0.424772641 0.858225089 spi 0.0150312687 0.293263846 -0.063102361 -0.202560755 spr -0.0105319834 0.230609514 -0.085859834 -0.086086306 sc -0.0049391857 0.014486054 1.000000000 0.532576231 smad 1.0000000000 -0.007504293 -0.004939186 -0.013226721 ssd -0.0075042926 1.000000000 0.014486054 0.167001635 scr -0.0049391857 0.014486054 1.000000000 0.532576231 sf -0.0132267209 0.167001635 0.532576231 1.000000000 > highlyCorrelated <- findCorrelation(correlationMatrix, cutoff=0.75,names=TRUE) > # print indexes of highly correlated attributes > print(highlyCorrelated) [1] "th" "ra" "cl" "sc" "sf"