Computes all pairwise percentage bend correlations for the numeric columns of a matrix or data frame. Percentage bend correlation limits the influence of extreme marginal observations by bending standardised deviations into the interval \([-1, 1]\), yielding a Pearson-like measure that is robust to outliers and heavy tails.
pbcor(
data,
beta = 0.2,
na_method = c("error", "pairwise"),
n_threads = getOption("matrixCorr.threads", 1L)
)# S3 method for pbcor
print(
x,
digits = 4,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
show_ci = NULL,
...
)
# S3 method for pbcor
plot(
x,
title = "Percentage bend correlation heatmap",
low_color = "indianred1",
high_color = "steelblue1",
mid_color = "white",
value_text_size = 4,
show_value = TRUE,
...
)
# S3 method for pbcor
summary(
object,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
show_ci = NULL,
...
)
A symmetric correlation matrix with class pbcor and
attributes method = "percentage_bend_correlation",
description, and package = "matrixCorr".
A numeric matrix or data frame containing numeric columns.
Bending constant in [0, 0.5) that sets the cutoff used to
bend standardised deviations toward the interval \([-1, 1]\). Larger
values cause more observations to be bent and increase resistance to
marginal outliers. Default 0.2. See Details.
One of "error" (default) or "pairwise".
With "pairwise", each correlation is computed on the overlapping
complete rows for the column pair.
Integer \(\geq 1\). Kept for API consistency with the other robust correlation wrappers. It is currently validated but not used by the exact percentage-bend implementation.
An object of class pbcor.
Integer; number of digits to print.
Optional row threshold for compact preview output.
Optional number of leading/trailing rows to show when truncated.
Optional maximum number of visible columns; NULL derives this
from console width.
Optional display width; defaults to getOption("width").
One of "yes" or "no".
Additional arguments passed to the underlying print or plot helper.
Character; plot title.
Colors used in the heatmap.
Numeric text size for overlaid cell values.
Logical; if TRUE (default), overlay numeric values
on the heatmap tiles.
An object of class pbcor.
Thiago de Paula Oliveira
For a column \(x = (x_i)_{i=1}^n\), let \(m = \mathrm{med}(x)\) and let
\(\omega_\beta(x)\) be the \(\lfloor (1-\beta)n \rfloor\)-th order
statistic of \(|x_i - m|\). The constant beta determines the cutoff
\(\omega_\beta(x)\) used to standardise deviations from the median. As
beta increases, the selected cutoff becomes smaller, so a larger
fraction of observations is truncated to the bounds \(-1\) and \(1\).
This makes the correlation more resistant to marginal outliers. The one-step
percentage-bend location is
$$
\hat\theta_{pb}(x) =
\frac{\sum_{i: |\psi_i| \le 1} x_i + \omega_\beta(x)(i_2 - i_1)}
{n - i_1 - i_2},
\qquad
\psi_i = \frac{x_i - m}{\omega_\beta(x)},
$$
where \(i_1 = \sum \mathbf{1}(\psi_i < -1)\) and
\(i_2 = \sum \mathbf{1}(\psi_i > 1)\).
The bent scores are $$ a_i = \max\{-1, \min(1, (x_i - \hat\theta_{pb}(x))/\omega_\beta(x))\}, $$ and likewise \(b_i\) for a second variable \(y\). The percentage bend correlation is $$ r_{pb}(x,y) = \frac{\sum_i a_i b_i} {\sqrt{\sum_i a_i^2}\sqrt{\sum_i b_i^2}}. $$
When na_method = "error", bent scores are computed once per column and
the matrix is formed from their cross-products. When
na_method = "pairwise", each pair is recomputed on its complete-case
overlap, which can break positive semidefiniteness as with pairwise Pearson
correlation.
Wilcox, R. R. (1994). The percentage bend correlation coefficient. Psychometrika, 59(4), 601-616. tools:::Rd_expr_doi("10.1007/BF02294395")
wincor(), skipped_corr(), bicor()
set.seed(10)
X <- matrix(rnorm(150 * 4), ncol = 4)
X[sample(length(X), 8)] <- X[sample(length(X), 8)] + 10
R <- pbcor(X)
print(R, digits = 2)
summary(R)
plot(R)
# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
view_corr_shiny(R)
}
Run the code above in your browser using DataLab