Computes all pairwise skipped correlation coefficients for the numeric columns of a matrix or data frame using a high-performance 'C++' backend.
Skipped correlation detects bivariate outliers using a projection rule and then computes Pearson or Spearman correlation on the retained observations. It is designed for situations where marginally robust methods can still be distorted by unusual points in the joint data cloud.
skipped_corr(
data,
method = c("pearson", "spearman"),
na_method = c("error", "pairwise"),
stand = TRUE,
outlier_rule = c("idealf", "mad"),
cutoff = sqrt(stats::qchisq(0.975, df = 2)),
n_threads = getOption("matrixCorr.threads", 1L),
return_masks = FALSE,
ci = FALSE,
p_value = FALSE,
conf_level = 0.95,
n_boot = 2000L,
p_adjust = c("none", "hochberg", "ecp"),
fwe_level = 0.05,
n_mc = 1000L,
seed = NULL
)skipped_corr_masks(x, var1 = NULL, var2 = NULL)
# S3 method for skipped_corr
print(
x,
digits = 4,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
ci_digits = 4,
show_ci = NULL,
show_p = c("auto", "yes", "no"),
...
)
# S3 method for skipped_corr
plot(
x,
title = "Skipped correlation heatmap",
low_color = "indianred1",
high_color = "steelblue1",
mid_color = "white",
value_text_size = 4,
ci_text_size = 3,
show_value = TRUE,
...
)
# S3 method for skipped_corr
summary(
object,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
show_ci = NULL,
...
)
# S3 method for summary.skipped_corr
print(
x,
digits = NULL,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
show_ci = NULL,
...
)
A symmetric correlation matrix with class skipped_corr and
attributes method = "skipped_correlation", description, and
package = "matrixCorr". When return_masks = TRUE, the matrix
also carries a skipped_masks attribute containing compact pairwise
skipped-row indices. The diagnostics attribute stores per-pair
complete-case counts and skipped-row counts/proportions. When
ci = TRUE or p_value = TRUE, bootstrap inference matrices are
attached via attributes.
A numeric matrix or a data frame with at least two numeric columns. All non-numeric columns will be excluded.
Correlation computed after removing projected outliers. One of
"pearson" (default) or "spearman".
One of "error" (default) or "pairwise".
With "error", the function requires all retained numeric columns to
be free of missing or non-finite values and aborts otherwise. This is the
recommended setting when you want a single common sample size across all
pairs, reproducible skipped-row diagnostics on the same rows, or bootstrap
inference via ci = TRUE / p_value = TRUE. With
"pairwise", each variable pair is computed on its own overlap of
finite rows. This is more permissive for incomplete data, but different
pairs can be based on different effective samples and different skipped-row
sets, so the resulting matrix is less directly comparable across entries.
Logical; if TRUE (default), each variable in the pair is
centred by its median and divided by a robust scale estimate before the
projection outlier search. The scale estimate is the MAD when positive,
with fallback to \(\mathrm{IQR}/1.34898\) and then the usual sample
standard deviation if needed. This standardisation affects only outlier
detection, not the final correlation computed on the retained
observations.
One of "idealf" (default) or "mad".
The default uses the ideal-fourths interquartile width of projected
distances; "mad" uses the median absolute deviation of projected
distances.
Positive numeric constant multiplying the projected spread in
the outlier rule
\(\mathrm{med}(d_{i\cdot}) + cutoff \times s(d_{i\cdot})\). Larger
values flag fewer observations as outliers; smaller values flag more.
Default sqrt(qchisq(0.975, df = 2)).
Integer \(\geq 1\). Number of OpenMP threads. Defaults to
getOption("matrixCorr.threads", 1L).
Logical; if TRUE, attach compact pairwise skipped-row
indices as an attribute. Default FALSE.
Logical; if TRUE, attach percentile-bootstrap confidence
intervals for each skipped correlation using the Wilcox (2015) B2
resampling scheme. Default FALSE.
Logical; if TRUE, attach bootstrap p-values for testing
whether each skipped correlation is zero. Default FALSE.
Confidence level used when ci = TRUE. Default
0.95.
Integer \(\geq 2\). Number of bootstrap resamples used when
ci = TRUE or p_value = TRUE. Default 2000.
One of "none" (default), "hochberg", or
"ecp". Optional familywise-error procedure applied to the matrix of
bootstrap p-values. "hochberg" corresponds to method H in Wilcox,
Rousselet, and Pernet (2018); "ecp" corresponds to their simulated
critical-p-value method ECP.
Familywise-error level used when
p_adjust = "hochberg" or "ecp". Default 0.05.
Integer \(\geq 10\). Number of null Monte Carlo data sets used
when p_adjust = "ecp" to estimate the critical p-value. Default
1000.
Optional positive integer used to seed the bootstrap resampling
when ci = TRUE or p_value = TRUE. If NULL, a fresh
internal seed is generated.
An object of class summary.skipped_corr.
Optional column names or 1-based column indices used by
skipped_corr_masks() to extract the skipped-row indices for one pair.
Integer; number of digits to print.
Optional row threshold for compact preview output.
Optional number of leading/trailing rows to show when truncated.
Optional maximum number of visible columns; NULL derives this
from console width.
Optional display width; defaults to getOption("width").
Integer; digits for skipped-correlation confidence limits.
One of "yes" or "no".
One of "auto", "yes", "no". For
print(), "auto" keeps the compact matrix-only display;
use "yes" to also print pairwise p-values.
Additional arguments passed to the underlying print or plot helper.
Character; plot title.
Colors used in the heatmap.
Numeric text size for overlaid cell values.
Text size for confidence intervals in the heatmap.
Logical; if TRUE (default), overlay numeric values
on the heatmap tiles.
An object of class skipped_corr.
Thiago de Paula Oliveira
Let \(X \in \mathbb{R}^{n \times p}\) be a numeric matrix with rows as
observations and columns as variables. For a given pair of columns
\((x, y)\), write the observed bivariate points as
\(u_i = (x_i, y_i)^\top\), \(i=1,\ldots,n\). If stand = TRUE,
each margin is first centred by its median and divided by a robust scale
estimate before outlier detection; otherwise the original pair is used. The
robust scale is the MAD when positive, with fallback to
\(\mathrm{IQR}/1.34898\) and then the usual sample standard deviation if
needed. Let \(\tilde u_i\) denote the resulting points and let \(c\) be
the componentwise median center of the detection cloud.
For each observation \(i\), define the direction vector \(b_i = \tilde u_i - c\). When \(\|b_i\| > 0\), all observations are projected onto the line through \(c\) in direction \(b_i\). The projected distances are $$ d_{ij} \;=\; \frac{|(\tilde u_j - c)^\top b_i|}{\|b_i\|}, \qquad j=1,\ldots,n. $$ For each direction \(i\), observation \(j\) is flagged as an outlier if
$$
d_{ij} \;>\; \mathrm{med}(d_{i\cdot}) + g\, s(d_{i\cdot}),
\qquad g = \code{cutoff},
$$
where \(s(\cdot)\) is either the ideal-fourths interquartile width
(outlier_rule = "idealf") or the median absolute deviation
(outlier_rule = "mad"). An observation is removed if it is flagged
for at least one projection direction. The skipped correlation is then the
ordinary Pearson or Spearman correlation computed from the retained
observations:
$$
r_{\mathrm{skip}}(x,y) \;=\;
\mathrm{cor}\!\left(x_{\mathcal{K}}, y_{\mathcal{K}}\right),
$$
where \(\mathcal{K}\) is the index set of observations not flagged as
outliers.
Unlike marginally robust methods such as pbcor(), wincor(),
or bicor(), skipped correlation is explicitly pairwise because
outlier detection depends on the joint geometry of each variable pair. As a
result, the reported matrix need not be positive semidefinite, even with
complete data.
Computational notes. In the complete-data path, each column pair
requires a full bivariate projection search, so the dominant cost is higher
than for marginal robust methods. The implementation evaluates pairs in
'C++'; where available, pairs are processed with 'OpenMP' parallelism. With
na_method = "pairwise", each pair is recomputed on its overlap of
non-missing rows.
Bootstrap inference. When ci = TRUE or p_value = TRUE,
the implementation uses the percentile-bootstrap strategy studied by Wilcox
(2015). Each bootstrap replicate resamples whole observation pairs with
replacement, reruns the skipped-correlation outlier detection on the
resampled data, and recomputes the skipped correlation on the retained
observations. This corresponds to Wilcox's B2 method and avoids the
statistically unsatisfactory shortcut of removing outliers only once before
bootstrapping. Bootstrap inference currently requires complete data
(na_method = "error"). When p_adjust = "hochberg", the
bootstrap p-values are processed with Hochberg's step-up procedure (method H
in Wilcox, Rousselet, and Pernet, 2018). When p_adjust = "ecp", the
package follows their ECP method and simulates n_mc null data sets
from a \(p\)-variate normal distribution with identity covariance,
recomputes the pairwise bootstrap p-values for each null data set, stores the
minimum p-value from each run, and estimates the fwe_level quantile of
that null distribution using the Harrell-Davis estimator. Hypotheses are then
rejected when their observed bootstrap p-values are less than or equal to the
estimated critical p-value. The calibrated H1 procedure from Wilcox,
Rousselet, and Pernet (2018) is not currently implemented.
Wilcox, R. R. (2004). Inferences based on a skipped correlation coefficient. Journal of Applied Statistics, 31(2), 131-143. tools:::Rd_expr_doi("10.1080/0266476032000148821")
Wilcox, R. R. (2015). Inferences about the skipped correlation coefficient: Dealing with heteroscedasticity and non-normality. Journal of Modern Applied Statistical Methods, 14(1), 172-188. tools:::Rd_expr_doi("10.22237/jmasm/1430453580")
Wilcox, R. R., Rousselet, G. A., & Pernet, C. R. (2018). Improved methods for making inferences about multiple skipped correlations. Journal of Statistical Computation and Simulation, 88(16), 3116-3131. tools:::Rd_expr_doi("10.1080/00949655.2018.1501051")
pbcor(), wincor(), bicor()
set.seed(12)
X <- matrix(rnorm(160 * 4), ncol = 4)
X[1, 1] <- 9
X[1, 2] <- -8
R <- skipped_corr(X, method = "pearson")
print(R, digits = 2)
summary(R)
plot(R)
Rm <- skipped_corr(X, method = "pearson", return_masks = TRUE)
skipped_corr_masks(Rm, 1, 2)
# Example 1:
Xm <- as.matrix(datasets::mtcars[, c("mpg", "disp", "hp", "wt")])
Rm2 <- skipped_corr(Xm, method = "spearman")
print(Rm2, digits = 2)
# Example 2:
Ri <- skipped_corr(Xm, method = "pearson", ci = TRUE, n_boot = 40, seed = 1)
Ri$ci
# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
view_corr_shiny(R)
}
Run the code above in your browser using DataLab