Learn R Programming

matrixCorr (version 0.10.0)

dcor: Pairwise Distance Correlation (dCor)

Description

Computes pairwise distance correlations for the numeric columns of a matrix or data frame using a high-performance 'C++' backend. Distance correlation detects general dependence, including non-linear relationships.

Usage

dcor(data, check_na = TRUE)

# S3 method for dcor print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )

# S3 method for dcor plot( x, title = "Distance correlation heatmap", low_color = "white", high_color = "steelblue1", value_text_size = 4, show_value = TRUE, ... )

# S3 method for dcor summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )

Value

A symmetric numeric matrix where the (i, j) entry is the unbiased distance correlation between the i-th and j-th numeric columns. The object has class dcor with attributes method = "distance_correlation", description, and package = "matrixCorr".

Invisibly returns x.

A ggplot object representing the heatmap.

Arguments

data

A numeric matrix or a data frame with at least two numeric columns. All non-numeric columns are dropped. Columns must be numeric.

check_na

Logical (default TRUE). When TRUE, inputs must be free of NA/NaN/Inf. Set to FALSE only if you have already handled missingness upstream.

x

An object of class dcor.

digits

Integer; number of decimal places to print.

n

Optional row threshold for compact preview output.

topn

Optional number of leading/trailing rows to show when truncated.

max_vars

Optional maximum number of visible columns; NULL derives this from console width.

width

Optional display width; defaults to getOption("width").

show_ci

One of "yes" or "no".

...

Additional arguments passed to ggplot2::theme() or other ggplot2 layers.

title

Plot title. Default is "Distance correlation heatmap".

low_color

Colour for zero correlation. Default is "white".

high_color

Colour for strong correlation. Default is "steelblue1".

value_text_size

Font size for displaying values. Default is 4.

show_value

Logical; if TRUE (default), overlay numeric values on the heatmap tiles.

object

An object of class dcor.

Author

Thiago de paula Oliveira

Details

Let \(x \in \mathbb{R}^n\) and \(D^{(x)}\) be the pairwise distance matrix with zero diagonal: \(D^{(x)}_{ii} = 0\), \(D^{(x)}_{ij} = |x_i - x_j|\) for \(i \neq j\). Define row sums \(r^{(x)}_i = \sum_{k \neq i} D^{(x)}_{ik}\) and grand sum \(S^{(x)} = \sum_{i \neq k} D^{(x)}_{ik}\). The U-centred matrix is $$A^{(x)}_{ij} = \begin{cases} D^{(x)}_{ij} - \dfrac{r^{(x)}_i + r^{(x)}_j}{n - 2} + \dfrac{S^{(x)}}{(n - 1)(n - 2)}, & i \neq j,\\[6pt] 0, & i = j~. \end{cases}$$ For two variables \(x,y\), the unbiased distance covariance and variances are $$\widehat{\mathrm{dCov}}^2_u(x,y) = \frac{2}{n(n-3)} \sum_{i<j} A^{(x)}_{ij} A^{(y)}_{ij} \;=\; \frac{1}{n(n-3)} \sum_{i \neq j} A^{(x)}_{ij} A^{(y)}_{ij},$$ with \(\widehat{\mathrm{dVar}}^2_u(x)\) defined analogously from \(A^{(x)}\). The unbiased distance correlation is $$\widehat{\mathrm{dCor}}_u(x,y) = \frac{\widehat{\mathrm{dCov}}_u(x,y)} {\sqrt{\widehat{\mathrm{dVar}}_u(x)\,\widehat{\mathrm{dVar}}_u(y)}} \in [0,1].$$

Computation. All heavy lifting (distance matrices, U-centering, and unbiased scaling) is implemented in C++ (ustat_dcor_matrix_cpp), so the R wrapper only validates/coerces the input. OpenMP parallelises the upper-triangular loops. The implementation includes a Huo-Szekely style univariate \(O(n \log n)\) dispatch for pairwise terms. We also have an exact unbiased \(O(n^2)\) fallback retained for robustness in small-sample or non-finite-path cases; no external dependencies are used.

References

Székely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. Annals of Statistics, 35(6), 2769–2794.

Székely, G. J., & Rizzo, M. L. (2013). The distance correlation t-test of independence. Journal of Multivariate Analysis, 117, 193-213.

Examples

Run this code
##Independent variables -> dCor ~ 0
set.seed(1)
X <- cbind(a = rnorm(200), b = rnorm(200))
D <- dcor(X)
print(D, digits = 3)
summary(D)

## Non-linear dependence: Pearson ~ 0, but unbiased dCor > 0
set.seed(42)
n <- 200
x <- rnorm(n)
y <- x^2 + rnorm(n, sd = 0.2)
XY <- cbind(x = x, y = y)
D2 <- dcor(XY)
# Compare Pearson vs unbiased distance correlation
round(c(pearson = cor(XY)[1, 2], dcor = D2["x", "y"]), 3)
summary(D2)
plot(D2, title = "Unbiased distance correlation (non-linear example)")

## Small AR(1) multivariate normal example
set.seed(7)
p <- 5; n <- 150; rho <- 0.6
Sigma <- rho^abs(outer(seq_len(p), seq_len(p), "-"))
X3 <- MASS::mvrnorm(n, mu = rep(0, p), Sigma = Sigma)
colnames(X3) <- paste0("V", seq_len(p))
D3 <- dcor(X3)
print(D3[1:3, 1:3], digits = 2)

# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
  view_corr_shiny(D)
}

Run the code above in your browser using DataLab