Computes pairwise Kendall's tau correlations for numeric data using a high-performance 'C++' backend. Optional confidence intervals are available for matrix and data-frame input.
kendall_tau(
data,
y = NULL,
check_na = TRUE,
ci = FALSE,
conf_level = 0.95,
ci_method = c("fieller", "if_el", "brown_benedetti")
)# S3 method for kendall_matrix
print(
x,
digits = 4,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
ci_digits = 3,
show_ci = NULL,
...
)
# S3 method for kendall_matrix
plot(
x,
title = "Kendall's Tau correlation heatmap",
low_color = "indianred1",
high_color = "steelblue1",
mid_color = "white",
value_text_size = 4,
ci_text_size = 3,
show_value = TRUE,
...
)
# S3 method for kendall_matrix
summary(
object,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
ci_digits = 3,
show_ci = NULL,
...
)
# S3 method for summary.kendall_matrix
print(
x,
digits = NULL,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
show_ci = NULL,
...
)
If y is NULL and data is a matrix/data frame: a
symmetric numeric matrix where entry (i, j) is the Kendall's tau
correlation between the i-th and j-th numeric columns. When
ci = TRUE, the object also carries a ci attribute with
elements est, lwr.ci, upr.ci, conf.level, and
ci.method. Pairwise complete-case sample sizes are stored in
attr(x, "diagnostics")$n_complete.
If y is provided (two-vector mode): a single numeric scalar,
the Kendall's tau correlation between data and y.
Invisibly returns the kendall_matrix object.
A ggplot object representing the heatmap.
For matrix/data frame mode, a numeric matrix or a data frame with at least
two numeric columns. All non-numeric columns are excluded. For two-vector
mode, a numeric vector x.
Optional numeric vector y of the same length as data
when data is a vector. If supplied, the function computes the
Kendall correlation between data and y using a
low-overhead scalar path and returns a single number.
Logical (default TRUE). If TRUE, inputs must
be free of missing/undefined values. Use FALSE only when missingness
has already been handled upstream.
Logical (default FALSE). If TRUE, attach pairwise
confidence intervals for the off-diagonal Kendall correlations in
matrix/data-frame mode.
Confidence level used when ci = TRUE. Default is
0.95.
Confidence-interval engine used when ci = TRUE.
Supported Kendall methods are "fieller" (default),
"brown_benedetti", and "if_el".
An object of class summary.kendall_matrix.
Integer; number of decimal places to print.
Optional row threshold for compact preview output.
Optional number of leading/trailing rows to show when truncated.
Optional maximum number of visible columns; NULL derives this
from console width.
Optional display width; defaults to getOption("width").
Integer; digits for Kendall confidence limits in the pairwise summary.
One of "yes" or "no".
Additional arguments passed to ggplot2::theme() or other
ggplot2 layers.
Plot title. Default is "Kendall's Tau correlation
heatmap".
Color for the minimum tau value. Default is
"indianred1".
Color for the maximum tau value. Default is
"steelblue1".
Color for zero correlation. Default is "white".
Font size for displaying correlation values. Default
is 4.
Text size for confidence intervals in the heatmap.
Logical; if TRUE (default), overlay numeric values
on the heatmap tiles.
An object of class kendall_matrix.
Thiago de Paula Oliveira
Kendall's tau is a rank-based measure of association between two variables. For a dataset with \(n\) observations on variables \(X\) and \(Y\), let \(n_0 = n(n - 1)/2\) be the number of unordered pairs, \(C\) the number of concordant pairs, and \(D\) the number of discordant pairs. Let \(T_x = \sum_g t_g (t_g - 1)/2\) and \(T_y = \sum_h u_h (u_h - 1)/2\) be the numbers of tied pairs within \(X\) and within \(Y\), respectively, where \(t_g\) and \(u_h\) are tie-group sizes in \(X\) and \(Y\).
The tie-robust Kendall's tau-b is: $$ \tau_b = \frac{C - D}{\sqrt{(n_0 - T_x)\,(n_0 - T_y)}}. $$ When there are no ties (\(T_x = T_y = 0\)), this reduces to tau-a: $$ \tau_a = \frac{C - D}{n(n-1)/2}. $$
The function automatically handles ties. In degenerate cases where a
variable is constant (\(n_0 = T_x\) or \(n_0 = T_y\)), the tau-b
denominator is zero and the correlation is undefined (returned as NA
off the diagonal).
When check_na = FALSE, each \((i,j)\) estimate is recomputed on the
pairwise complete-case overlap of columns \(i\) and \(j\). Confidence
intervals use the observed pairwise-complete Kendall estimate and the same
pairwise complete-case overlap.
With ci_method = "fieller", the interval is built on the Fisher-style
transformed scale \(z = \operatorname{atanh}(\hat\tau)\) using Fieller's
asymptotic standard error
$$ \operatorname{SE}(z) = \sqrt{\frac{0.437}{n - 4}}, $$
where \(n\) is the pairwise complete-case sample size. The interval is then
mapped back with tanh() and clipped to \([-1, 1]\) for numerical
safety. This is the default Kendall CI and is intended to be the fast,
production-oriented choice.
With ci_method = "brown_benedetti", the interval is computed on the
Kendall tau scale using the Brown-Benedetti large-sample variance for
Kendall's tau-b. This path is tie-aware, remains on the original Kendall
scale, and is intended as a conventional asymptotic alternative when a
direct tau-scale interval is preferred.
With ci_method = "if_el", the interval is computed in 'C++' using an
influence-function empirical-likelihood construction built from the
linearised Kendall estimating equation. The lower and upper limits are found
by solving the empirical-likelihood ratio equation against the
\(\chi^2_1\)-cutoff implied by conf_level. This method is slower
than "fieller" and is intended for specialised inference.
Performance:
In the two-vector mode (y supplied), the C++ backend uses a
raw-double path with minimal overhead.
In the matrix/data-frame mode, the no-missing estimate-only path
uses the Knight (1966) \(O(n \log n)\) algorithm. Pairwise-complete
inference paths recompute each pair on its complete-case overlap; the
"brown_benedetti" interval adds tie-aware large-sample variance
calculations and "if_el" adds extra per-pair likelihood solving.
Kendall, M. G. (1938). A New Measure of Rank Correlation. Biometrika, 30(1/2), 81-93.
Knight, W. R. (1966). A Computer Method for Calculating Kendall's Tau with Ungrouped Data. Journal of the American Statistical Association, 61(314), 436-439.
Fieller, E. C., Hartley, H. O., & Pearson, E. S. (1957). Tests for rank correlation coefficients. I. Biometrika, 44(3/4), 470-481.
Brown, M. B., & Benedetti, J. K. (1977). Sampling behavior of tests for correlation in two-way contingency tables. Journal of the American Statistical Association, 72(358), 309-315.
Huang, Z., & Qin, G. (2023). Influence function-based confidence intervals for the Kendall rank correlation coefficient. Computational Statistics, 38(2), 1041-1055.
Croux, C., & Dehon, C. (2010). Influence functions of the Spearman and Kendall correlation measures. Statistical Methods & Applications, 19, 497-515.
print.kendall_matrix, plot.kendall_matrix
# Basic usage with a matrix
mat <- cbind(a = rnorm(100), b = rnorm(100), c = rnorm(100))
kt <- kendall_tau(mat)
print(kt)
summary(kt)
plot(kt)
# Confidence intervals
kt_ci <- kendall_tau(mat[, 1:3], ci = TRUE)
print(kt_ci, show_ci = "yes")
summary(kt_ci)
# Two-vector mode (scalar path)
x <- rnorm(1000); y <- 0.5 * x + rnorm(1000)
kendall_tau(x, y)
# Including ties
tied_df <- data.frame(
v1 = rep(1:5, each = 20),
v2 = rep(5:1, each = 20),
v3 = rnorm(100)
)
kt_tied <- kendall_tau(tied_df, ci = TRUE, ci_method = "fieller")
print(kt_tied, show_ci = "yes")
# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
view_corr_shiny(kt)
}
Run the code above in your browser using DataLab