Learn R Programming

matrixCorr (version 0.8.3)

spearman_rho: Pairwise Spearman's rank correlation

Description

Computes all pairwise Spearman's rank correlation coefficients for the numeric columns of a matrix or data frame using a high-performance 'C++' backend.

This function ranks the data and computes Pearson correlation on ranks, which is equivalent to Spearman’s rho. It supports large datasets and is optimized in 'C++' for performance.

Prints a summary of the Spearman's correlation matrix, including description and method metadata.

Generates a ggplot2-based heatmap of the Spearman's rank correlation matrix.

Usage

spearman_rho(data, check_na = TRUE)

# S3 method for spearman_rho print(x, digits = 4, max_rows = NULL, max_cols = NULL, ...)

# S3 method for spearman_rho plot( x, title = "Spearman's rank correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ... )

Value

A symmetric numeric matrix where the (i, j)-th element is the Spearman correlation between the i-th and j-th numeric columns of the input.

Invisibly returns the spearman_rho object.

A ggplot object representing the heatmap.

Arguments

data

A numeric matrix or a data frame with at least two numeric columns. All non-numeric columns will be excluded. Each column must have at least two non-missing values.

check_na

Logical (default TRUE). If TRUE, the input is required to be free of NA/NaN/Inf. Set to FALSE only when the caller already handled missingness.

x

An object of class spearman_rho.

digits

Integer; number of decimal places to print.

max_rows

Optional integer; maximum number of rows to display. If NULL, all rows are shown.

max_cols

Optional integer; maximum number of columns to display. If NULL, all columns are shown.

...

Additional arguments passed to ggplot2::theme() or other ggplot2 layers.

title

Plot title. Default is "Spearman's rank correlation heatmap".

low_color

Color for the minimum rho value. Default is "indianred1".

high_color

Color for the maximum rho value. Default is "steelblue1".

mid_color

Color for zero correlation. Default is "white".

value_text_size

Font size for displaying correlation values. Default is 4.

Author

Thiago de Paula Oliveira

Details

For each column \(j=1,\ldots,p\), let \(R_{\cdot j} \in \{1,\ldots,n\}^n\) denote the (mid-)ranks of \(X_{\cdot j}\), assigning average ranks to ties. The mean rank is \(\bar R_j = (n+1)/2\) regardless of ties. Define the centred rank vectors \(\tilde R_{\cdot j} = R_{\cdot j} - \bar R_j \mathbf{1}\), where \(\mathbf{1}\in\mathbb{R}^n\) is the all-ones vector. The Spearman correlation between columns \(i\) and \(j\) is the Pearson correlation of their rank vectors: $$ \rho_S(i,j) \;=\; \frac{\sum_{k=1}^n (R_{ki}-\bar R_i)(R_{kj}-\bar R_j)} {\sqrt{\sum_{k=1}^n (R_{ki}-\bar R_i)^2}\; \sqrt{\sum_{k=1}^n (R_{kj}-\bar R_j)^2}}. $$ In matrix form, with \(R=[R_{\cdot 1},\ldots,R_{\cdot p}]\), \(\mu=(n+1)\mathbf{1}_p/2\) for \(\mathbf{1}_p\in\mathbb{R}^p\), and \(S_R=\bigl(R-\mathbf{1}\mu^\top\bigr)^\top \bigl(R-\mathbf{1}\mu^\top\bigr)/(n-1)\), the Spearman correlation matrix is $$ \widehat{\rho}_S \;=\; D^{-1/2} S_R D^{-1/2}, \qquad D \;=\; \mathrm{diag}(\mathrm{diag}(S_R)). $$ When there are no ties, the familiar rank-difference formula obtains $$ \rho_S(i,j) \;=\; 1 - \frac{6}{n(n^2-1)} \sum_{k=1}^n d_k^2, \quad d_k \;=\; R_{ki}-R_{kj}, $$ but this expression does not hold under ties; computing Pearson on mid-ranks (as above) is the standard tie-robust approach. Without ties, \(\mathrm{Var}(R_{\cdot j})=(n^2-1)/12\); with ties, the variance is smaller.

\(\rho_S(i,j) \in [-1,1]\) and \(\widehat{\rho}_S\) is symmetric positive semi-definite by construction (up to floating-point error). The implementation symmetrises the result to remove round-off asymmetry. Spearman’s correlation is invariant to strictly monotone transformations applied separately to each variable.

Computation. Each column is ranked (mid-ranks) to form \(R\). The product \(R^\top R\) is computed via a 'BLAS' symmetric rank update ('SYRK'), and centred using $$ (R-\mathbf{1}\mu^\top)^\top (R-\mathbf{1}\mu^\top) \;=\; R^\top R \;-\; n\,\mu\mu^\top, $$ avoiding an explicit centred copy. Division by \(n-1\) yields the sample covariance of ranks; standardising by \(D^{-1/2}\) gives \(\widehat{\rho}_S\). Columns with zero rank variance (all values equal) are returned as NA along their row/column; the corresponding diagonal entry is also NA.

Ranking costs \(O\!\bigl(p\,n\log n\bigr)\); forming and normalising \(R^\top R\) costs \(O\!\bigl(n p^2\bigr)\) with \(O(p^2)\) additional memory. 'OpenMP' parallelism is used across columns for ranking, and a 'BLAS' 'SYRK' kernel is used for the matrix product when available.

References

Spearman, C. (1904). The proof and measurement of association between two things. International Journal of Epidemiology, 39(5), 1137-1150.

See Also

print.spearman_rho, plot.spearman_rho

Examples

Run this code
## Monotone transformation invariance (Spearman is rank-based)
set.seed(123)
n <- 400; p <- 6; rho <- 0.6
# AR(1) correlation
Sigma <- rho^abs(outer(seq_len(p), seq_len(p), "-"))
L <- chol(Sigma)
X <- matrix(rnorm(n * p), n, p) %*% L
colnames(X) <- paste0("V", seq_len(p))

# Monotone transforms to some columns
X_mono <- X
# exponential
X_mono[, 1] <- exp(X_mono[, 1])
# softplus
X_mono[, 2] <- log1p(exp(X_mono[, 2]))
# odd monotone polynomial
X_mono[, 3] <- X_mono[, 3]^3

sp_X <- spearman_rho(X)
sp_m <- spearman_rho(X_mono)

# Spearman should be (nearly) unchanged under monotone transformations
round(max(abs(sp_X - sp_m)), 3)
# heatmap of Spearman correlations
plot(sp_X)

## Ties handled via mid-ranks
tied <- cbind(
  # many ties
  a = rep(1:5, each = 20),
  # noisy reverse order
  b = rep(5:1, each = 20) + rnorm(100, sd = 0.1),
  # ordinal with ties
  c = as.numeric(gl(10, 10))
)
sp_tied <- spearman_rho(tied)
print(sp_tied, digits = 2)

## Bivariate normal, theoretical Spearman's rho
## For BVN with Pearson correlation r, rho_S = (6/pi) * asin(r/2).
r_target <- c(-0.8, -0.4, 0, 0.4, 0.8)
n2 <- 200
est <- true_corr <- numeric(length(r_target))
for (i in seq_along(r_target)) {
  R2 <- matrix(c(1, r_target[i], r_target[i], 1), 2, 2)
  Z  <- matrix(rnorm(n2 * 2), n2, 2) %*% chol(R2)
  s  <- spearman_rho(Z)
  est[i]  <- s[1, 2]
  true_corr[i] <- (6 / pi) * asin(r_target[i] / 2)
}
cbind(r_target, est = round(est, 3), theory = round(true_corr, 3))

# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
  view_corr_shiny(sp_X)
}

Run the code above in your browser using DataLab