Last chance! 50% off unlimited learning
Sale ends in
wdcor.table)
).
Another idea is to make the distance correlation more robust by assigning small weights to observations which are far from the rest of the data.
For large datasets the distance correlation is often said to be too inefficient to be of any great use. The function approx.dcor offers a pretty good approximation of the distance correlation via binning and wdcor.table)
.
wdcor.data.frame
computes a distance correlation matrix. Factor variables are transformed to integer via data.matrix
.wdcor(x,...)
## S3 method for class 'default':
wdcor(x,y,w = NULL,ep = 1, approx = FALSE, n = 50,na.rm = TRUE, \dots)
## S3 method for class 'table':
wdcor(x,ep = 1,\dots)
## S3 method for class 'data.frame':
wdcor(x, w = NULL, ep = 1, approx = FALSE, n = 50, \dots)
ep
.# repeat and change N for different results and computation times.
N <- 2000
x1 <- rnorm(N,mean=10,sd=3)
x2 <- runif(N,0,40)
x3 <- rnorm(N,mean=30,sd=4)
x <- sample(c(x1,x2,x3),N)
y <- rnorm(1,sd=0.0001)*(x-mean(x))^4+ rnorm(1,sd=0.01)*(x-mean(x))^3
y <- y+ rnorm(1,sd=0.1)*(x-mean(x))^2
y <- y+ rnorm(1)*(x-mean(x))+rnorm(N,sd=runif(N,3,10))
y <- y+ runif(N,0,20)*sin(abs(scale(x))*2*pi)
require(scales)
plot(x,y,pch=19,col=alpha("black",0.2))
system.time(dd<-wdcor(x,y))
y2 <- runif(2000)
system.time(dde<-wdcor(x,y2))
dd
dde
y <- diamonds$price
x <- diamonds$carat
length(x) # 53940
# auto approximation via approx.dcor
wdcor(x,y)
# the weighted distance correlation is also applicable to
# discrete data:
A <- arsim(2000,c(12,12),4,0.1)
wdcor(A)
wdcor(optile(A))
wdcor(optile(A, fun = "distcor"))
# kernel density weights:
kd <- kde2d(x,y,n=50)
xy <- expand.grid(kd$x,kd$y)
wdcor(xy[,1],xy[,2], w = kd$z)
# this is the approximate distance correlation for the 2D density estimate
# a pairwise matrix:
D <- wdcor(olives[,3:10])
fluctile(D^2, shape="c")
Run the code above in your browser using DataLab