Learn R Programming

matrixCorr (version 0.8.3)

schafer_corr: Schafer-Strimmer shrinkage correlation

Description

Computes a shrinkage correlation matrix using the Schafer-Strimmer approach with an analytic, data-driven intensity \(\hat\lambda\). The off-diagonals of the sample Pearson correlation \(R\) are shrunk towards zero, yielding \(R_{\mathrm{shr}}=(1-\hat\lambda)R+\hat\lambda I\) with \(\mathrm{diag}(R_{\mathrm{shr}})=1\), stabilising estimates when \(p \ge n\).

This function uses a high-performance 'C++' backend that forms \(X^\top X\) via 'BLAS' 'SYRK', applies centring via a rank-1 update, converts to Pearson correlation, estimates \(\hat\lambda\), and shrinks the off-diagonals: \(R_{\mathrm{shr}} = (1-\hat\lambda)R + \hat\lambda I\).

Prints a summary of the shrinkage correlation matrix with optional truncation for large objects.

Heatmap of the shrinkage correlation matrix with optional hierarchical clustering and triangular display. Uses ggplot2 and geom_raster() for speed on larger matrices.

Usage

schafer_corr(data)

# S3 method for schafer_corr print(x, digits = 4, max_rows = NULL, max_cols = NULL, ...)

# S3 method for schafer_corr plot( x, title = "Schafer-Strimmer shrinkage correlation", cluster = TRUE, hclust_method = "complete", triangle = c("upper", "lower", "full"), show_values = FALSE, value_text_limit = 60, value_text_size = 3, palette = c("diverging", "viridis"), ... )

Value

A symmetric numeric matrix of class schafer_corr where entry (i, j) is the shrunk correlation between the i-th and j-th numeric columns. Attributes:

  • method = "schafer_shrinkage"

  • description = "Schafer-Strimmer shrinkage correlation matrix"

  • package = "matrixCorr"

Columns with zero variance are set to NA across row/column (including the diagonal), matching pearson_corr() behaviour.

Invisibly returns x.

A ggplot object.

Arguments

data

A numeric matrix or a data frame with at least two numeric columns. All non-numeric columns will be excluded. Columns must be numeric and contain no NAs.

x

An object of class schafer_corr.

digits

Integer; number of decimal places to print.

max_rows

Optional integer; maximum number of rows to display. If NULL, all rows are shown.

max_cols

Optional integer; maximum number of columns to display. If NULL, all columns are shown.

...

Additional arguments passed to ggplot2::theme().

title

Plot title.

cluster

Logical; if TRUE, reorder rows/cols by hierarchical clustering on distance \(1 - r\).

hclust_method

Linkage method for hclust; default "complete".

triangle

One of "full", "upper", "lower". Default to upper.

show_values

Logical; print correlation values inside tiles (only if matrix dimension \(\le\) value_text_limit).

value_text_limit

Integer threshold controlling when values are drawn.

value_text_size

Font size for values if shown.

palette

Character; "diverging" (default) or "viridis".

Author

Thiago de Paula Oliveira

Details

Let \(R\) be the sample Pearson correlation matrix. The Schafer-Strimmer shrinkage estimator targets the identity in correlation space and uses \(\hat\lambda = \frac{\sum_{i<j}\widehat{\mathrm{Var}}(r_{ij})} {\sum_{i<j} r_{ij}^2}\) (clamped to \([0,1]\)), where \(\widehat{\mathrm{Var}}(r_{ij}) \approx \frac{(1-r_{ij}^2)^2}{n-1}\). The returned estimator is \(R_{\mathrm{shr}} = (1-\hat\lambda)R + \hat\lambda I\).

References

Schafer, J. & Strimmer, K. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4(1).

See Also

print.schafer_corr, plot.schafer_corr, pearson_corr

Examples

Run this code
## Multivariate normal with AR(1) dependence (Toeplitz correlation)
set.seed(1)
n <- 80; p <- 40; rho <- 0.6
d <- abs(outer(seq_len(p), seq_len(p), "-"))
Sigma <- rho^d

X <- MASS::mvrnorm(n, mu = rep(0, p), Sigma = Sigma)
colnames(X) <- paste0("V", seq_len(p))

Rshr <- schafer_corr(X)
print(Rshr, digits = 2, max_rows = 6, max_cols = 6)
plot(Rshr)

## Shrinkage typically moves the sample correlation closer to the truth
Rraw <- stats::cor(X)
off  <- upper.tri(Sigma, diag = FALSE)
mae_raw <- mean(abs(Rraw[off] - Sigma[off]))
mae_shr <- mean(abs(Rshr[off] - Sigma[off]))
print(c(MAE_raw = mae_raw, MAE_shrunk = mae_shr))
plot(Rshr, title = "Schafer-Strimmer shrinkage correlation")

# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
  view_corr_shiny(Rshr)
}

Run the code above in your browser using DataLab