Computes all pairwise Winsorized correlation coefficients for the numeric columns of a matrix or data frame using a high-performance 'C++' backend.
This function Winsorizes each margin at proportion tr and then
computes ordinary Pearson correlation on the Winsorized values. It is a
simple robust alternative to Pearson correlation when the main concern is
unusually large or small observations in the marginal distributions.
wincor(
data,
tr = 0.2,
na_method = c("error", "pairwise"),
n_threads = getOption("matrixCorr.threads", 1L)
)# S3 method for wincor
print(
x,
digits = 4,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
show_ci = NULL,
...
)
# S3 method for wincor
plot(
x,
title = "Winsorized correlation heatmap",
low_color = "indianred1",
high_color = "steelblue1",
mid_color = "white",
value_text_size = 4,
show_value = TRUE,
...
)
# S3 method for wincor
summary(
object,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
show_ci = NULL,
...
)
A symmetric correlation matrix with class wincor and
attributes method = "winsorized_correlation", description,
and package = "matrixCorr".
A numeric matrix or a data frame with at least two numeric columns. All non-numeric columns will be excluded.
Winsorization proportion in [0, 0.5). For a sample of size
\(n\), let \(g = \lfloor tr \cdot n \rfloor\); the \(g\) smallest
observations are set to the \((g+1)\)-st order statistic and the
\(g\) largest observations are set to the \((n-g)\)-th order
statistic. Default 0.2.
One of "error" (default) or "pairwise".
Integer \(\geq 1\). Number of OpenMP threads. Defaults to
getOption("matrixCorr.threads", 1L).
An object of class wincor.
Integer; number of digits to print.
Optional row threshold for compact preview output.
Optional number of leading/trailing rows to show when truncated.
Optional maximum number of visible columns; NULL derives this
from console width.
Optional display width; defaults to getOption("width").
One of "yes" or "no".
Additional arguments passed to the underlying print or plot helper.
Character; plot title.
Colors used in the heatmap.
Numeric text size for overlaid cell values.
Logical; if TRUE (default), overlay numeric values
on the heatmap tiles.
An object of class wincor.
Thiago de Paula Oliveira
Let \(X \in \mathbb{R}^{n \times p}\) be a numeric matrix with rows as observations and columns as variables. For a column \(x = (x_i)_{i=1}^n\), write the order statistics as \(x_{(1)} \le \cdots \le x_{(n)}\) and let \(g = \lfloor tr \cdot n \rfloor\). The Winsorized values can be written as $$ x_i^{(w)} \;=\; \max\!\bigl\{x_{(g+1)},\, \min(x_i, x_{(n-g)})\bigr\}. $$ For two columns \(x\) and \(y\), the Winsorized correlation is the ordinary Pearson correlation computed from \(x^{(w)}\) and \(y^{(w)}\): $$ r_w(x,y) \;=\; \frac{\sum_{i=1}^n (x_i^{(w)}-\bar x^{(w)})(y_i^{(w)}-\bar y^{(w)})} {\sqrt{\sum_{i=1}^n (x_i^{(w)}-\bar x^{(w)})^2}\; \sqrt{\sum_{i=1}^n (y_i^{(w)}-\bar y^{(w)})^2}}. $$
In matrix form, let \(X^{(w)}\) contain the Winsorized columns and define the centred, unit-norm columns $$ z_{\cdot j} = \frac{x_{\cdot j}^{(w)} - \bar x_j^{(w)} \mathbf{1}} {\sqrt{\sum_{i=1}^n (x_{ij}^{(w)}-\bar x_j^{(w)})^2}}, \qquad j=1,\ldots,p. $$ If \(Z = [z_{\cdot 1}, \ldots, z_{\cdot p}]\), then the Winsorized correlation matrix is $$ R_w \;=\; Z^\top Z. $$
Winsorization acts on each margin separately, so it guards against marginal
outliers and heavy tails but does not target unusual points in the joint
cloud. This implementation Winsorizes each column in 'C++', centres and
normalises it, and forms the complete-data matrix from cross-products. With
na_method = "pairwise", each pair is recomputed on its overlap of
non-missing rows. As with Pearson correlation, the complete-data path yields
a symmetric positive semidefinite matrix, whereas pairwise deletion can
break positive semidefiniteness.
Computational complexity. In the complete-data path, Winsorizing the columns requires sorting within each column, and forming the cross-product matrix costs \(O(n p^2)\) with \(O(p^2)\) output storage.
Wilcox, R. R. (1993). Some results on a Winsorized correlation coefficient. British Journal of Mathematical and Statistical Psychology, 46(2), 339-349. tools:::Rd_expr_doi("10.1111/j.2044-8317.1993.tb01020.x")
Wilcox, R. R. (2012). Introduction to Robust Estimation and Hypothesis Testing (3rd ed.). Academic Press.
pbcor(), skipped_corr(), bicor()
set.seed(11)
X <- matrix(rnorm(180 * 4), ncol = 4)
X[sample(length(X), 6)] <- X[sample(length(X), 6)] - 12
R <- wincor(X, tr = 0.2)
print(R, digits = 2)
summary(R)
plot(R)
# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
view_corr_shiny(R)
}
Run the code above in your browser using DataLab