Learn R Programming

matrixCorr (version 0.10.0)

pbcor: Percentage bend correlation

Description

Computes all pairwise percentage bend correlations for the numeric columns of a matrix or data frame. Percentage bend correlation limits the influence of extreme marginal observations by bending standardised deviations into the interval \([-1, 1]\), yielding a Pearson-like measure that is robust to outliers and heavy tails.

Usage

pbcor(
  data,
  beta = 0.2,
  na_method = c("error", "pairwise"),
  n_threads = getOption("matrixCorr.threads", 1L)
)

# S3 method for pbcor print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )

# S3 method for pbcor plot( x, title = "Percentage bend correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, show_value = TRUE, ... )

# S3 method for pbcor summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )

Value

A symmetric correlation matrix with class pbcor and attributes method = "percentage_bend_correlation", description, and package = "matrixCorr".

Arguments

data

A numeric matrix or data frame containing numeric columns.

beta

Bending constant in [0, 0.5) that sets the cutoff used to bend standardised deviations toward the interval \([-1, 1]\). Larger values cause more observations to be bent and increase resistance to marginal outliers. Default 0.2. See Details.

na_method

One of "error" (default) or "pairwise". With "pairwise", each correlation is computed on the overlapping complete rows for the column pair.

n_threads

Integer \(\geq 1\). Kept for API consistency with the other robust correlation wrappers. It is currently validated but not used by the exact percentage-bend implementation.

x

An object of class pbcor.

digits

Integer; number of digits to print.

n

Optional row threshold for compact preview output.

topn

Optional number of leading/trailing rows to show when truncated.

max_vars

Optional maximum number of visible columns; NULL derives this from console width.

width

Optional display width; defaults to getOption("width").

show_ci

One of "yes" or "no".

...

Additional arguments passed to the underlying print or plot helper.

title

Character; plot title.

low_color, high_color, mid_color

Colors used in the heatmap.

value_text_size

Numeric text size for overlaid cell values.

show_value

Logical; if TRUE (default), overlay numeric values on the heatmap tiles.

object

An object of class pbcor.

Author

Thiago de Paula Oliveira

Details

For a column \(x = (x_i)_{i=1}^n\), let \(m = \mathrm{med}(x)\) and let \(\omega_\beta(x)\) be the \(\lfloor (1-\beta)n \rfloor\)-th order statistic of \(|x_i - m|\). The constant beta determines the cutoff \(\omega_\beta(x)\) used to standardise deviations from the median. As beta increases, the selected cutoff becomes smaller, so a larger fraction of observations is truncated to the bounds \(-1\) and \(1\). This makes the correlation more resistant to marginal outliers. The one-step percentage-bend location is $$ \hat\theta_{pb}(x) = \frac{\sum_{i: |\psi_i| \le 1} x_i + \omega_\beta(x)(i_2 - i_1)} {n - i_1 - i_2}, \qquad \psi_i = \frac{x_i - m}{\omega_\beta(x)}, $$ where \(i_1 = \sum \mathbf{1}(\psi_i < -1)\) and \(i_2 = \sum \mathbf{1}(\psi_i > 1)\).

The bent scores are $$ a_i = \max\{-1, \min(1, (x_i - \hat\theta_{pb}(x))/\omega_\beta(x))\}, $$ and likewise \(b_i\) for a second variable \(y\). The percentage bend correlation is $$ r_{pb}(x,y) = \frac{\sum_i a_i b_i} {\sqrt{\sum_i a_i^2}\sqrt{\sum_i b_i^2}}. $$

When na_method = "error", bent scores are computed once per column and the matrix is formed from their cross-products. When na_method = "pairwise", each pair is recomputed on its complete-case overlap, which can break positive semidefiniteness as with pairwise Pearson correlation.

References

Wilcox, R. R. (1994). The percentage bend correlation coefficient. Psychometrika, 59(4), 601-616. tools:::Rd_expr_doi("10.1007/BF02294395")

See Also

wincor(), skipped_corr(), bicor()

Examples

Run this code
set.seed(10)
X <- matrix(rnorm(150 * 4), ncol = 4)
X[sample(length(X), 8)] <- X[sample(length(X), 8)] + 10

R <- pbcor(X)
print(R, digits = 2)
summary(R)
plot(R)

# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
  view_corr_shiny(R)
}

Run the code above in your browser using DataLab