Learn R Programming

matrixCorr (version 0.10.0)

kendall_tau: Pairwise (or Two-Vector) Kendall's Tau Rank Correlation

Description

Computes pairwise Kendall's tau correlations for numeric data using a high-performance 'C++' backend. Optional confidence intervals are available for matrix and data-frame input.

Usage

kendall_tau(
  data,
  y = NULL,
  check_na = TRUE,
  ci = FALSE,
  conf_level = 0.95,
  ci_method = c("fieller", "if_el", "brown_benedetti")
)

# S3 method for kendall_matrix print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, show_ci = NULL, ... )

# S3 method for kendall_matrix plot( x, title = "Kendall's Tau correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, ... )

# S3 method for kendall_matrix summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, show_ci = NULL, ... )

# S3 method for summary.kendall_matrix print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )

Value

  • If y is NULL and data is a matrix/data frame: a symmetric numeric matrix where entry (i, j) is the Kendall's tau correlation between the i-th and j-th numeric columns. When ci = TRUE, the object also carries a ci attribute with elements est, lwr.ci, upr.ci, conf.level, and ci.method. Pairwise complete-case sample sizes are stored in attr(x, "diagnostics")$n_complete.

  • If y is provided (two-vector mode): a single numeric scalar, the Kendall's tau correlation between data and y.

Invisibly returns the kendall_matrix object.

A ggplot object representing the heatmap.

Arguments

data

For matrix/data frame mode, a numeric matrix or a data frame with at least two numeric columns. All non-numeric columns are excluded. For two-vector mode, a numeric vector x.

y

Optional numeric vector y of the same length as data when data is a vector. If supplied, the function computes the Kendall correlation between data and y using a low-overhead scalar path and returns a single number.

check_na

Logical (default TRUE). If TRUE, inputs must be free of missing/undefined values. Use FALSE only when missingness has already been handled upstream.

ci

Logical (default FALSE). If TRUE, attach pairwise confidence intervals for the off-diagonal Kendall correlations in matrix/data-frame mode.

conf_level

Confidence level used when ci = TRUE. Default is 0.95.

ci_method

Confidence-interval engine used when ci = TRUE. Supported Kendall methods are "fieller" (default), "brown_benedetti", and "if_el".

x

An object of class summary.kendall_matrix.

digits

Integer; number of decimal places to print.

n

Optional row threshold for compact preview output.

topn

Optional number of leading/trailing rows to show when truncated.

max_vars

Optional maximum number of visible columns; NULL derives this from console width.

width

Optional display width; defaults to getOption("width").

ci_digits

Integer; digits for Kendall confidence limits in the pairwise summary.

show_ci

One of "yes" or "no".

...

Additional arguments passed to ggplot2::theme() or other ggplot2 layers.

title

Plot title. Default is "Kendall's Tau correlation heatmap".

low_color

Color for the minimum tau value. Default is "indianred1".

high_color

Color for the maximum tau value. Default is "steelblue1".

mid_color

Color for zero correlation. Default is "white".

value_text_size

Font size for displaying correlation values. Default is 4.

ci_text_size

Text size for confidence intervals in the heatmap.

show_value

Logical; if TRUE (default), overlay numeric values on the heatmap tiles.

object

An object of class kendall_matrix.

Author

Thiago de Paula Oliveira

Details

Kendall's tau is a rank-based measure of association between two variables. For a dataset with \(n\) observations on variables \(X\) and \(Y\), let \(n_0 = n(n - 1)/2\) be the number of unordered pairs, \(C\) the number of concordant pairs, and \(D\) the number of discordant pairs. Let \(T_x = \sum_g t_g (t_g - 1)/2\) and \(T_y = \sum_h u_h (u_h - 1)/2\) be the numbers of tied pairs within \(X\) and within \(Y\), respectively, where \(t_g\) and \(u_h\) are tie-group sizes in \(X\) and \(Y\).

The tie-robust Kendall's tau-b is: $$ \tau_b = \frac{C - D}{\sqrt{(n_0 - T_x)\,(n_0 - T_y)}}. $$ When there are no ties (\(T_x = T_y = 0\)), this reduces to tau-a: $$ \tau_a = \frac{C - D}{n(n-1)/2}. $$

The function automatically handles ties. In degenerate cases where a variable is constant (\(n_0 = T_x\) or \(n_0 = T_y\)), the tau-b denominator is zero and the correlation is undefined (returned as NA off the diagonal).

When check_na = FALSE, each \((i,j)\) estimate is recomputed on the pairwise complete-case overlap of columns \(i\) and \(j\). Confidence intervals use the observed pairwise-complete Kendall estimate and the same pairwise complete-case overlap.

With ci_method = "fieller", the interval is built on the Fisher-style transformed scale \(z = \operatorname{atanh}(\hat\tau)\) using Fieller's asymptotic standard error $$ \operatorname{SE}(z) = \sqrt{\frac{0.437}{n - 4}}, $$ where \(n\) is the pairwise complete-case sample size. The interval is then mapped back with tanh() and clipped to \([-1, 1]\) for numerical safety. This is the default Kendall CI and is intended to be the fast, production-oriented choice.

With ci_method = "brown_benedetti", the interval is computed on the Kendall tau scale using the Brown-Benedetti large-sample variance for Kendall's tau-b. This path is tie-aware, remains on the original Kendall scale, and is intended as a conventional asymptotic alternative when a direct tau-scale interval is preferred.

With ci_method = "if_el", the interval is computed in 'C++' using an influence-function empirical-likelihood construction built from the linearised Kendall estimating equation. The lower and upper limits are found by solving the empirical-likelihood ratio equation against the \(\chi^2_1\)-cutoff implied by conf_level. This method is slower than "fieller" and is intended for specialised inference.

Performance:

  • In the two-vector mode (y supplied), the C++ backend uses a raw-double path with minimal overhead.

  • In the matrix/data-frame mode, the no-missing estimate-only path uses the Knight (1966) \(O(n \log n)\) algorithm. Pairwise-complete inference paths recompute each pair on its complete-case overlap; the "brown_benedetti" interval adds tie-aware large-sample variance calculations and "if_el" adds extra per-pair likelihood solving.

References

Kendall, M. G. (1938). A New Measure of Rank Correlation. Biometrika, 30(1/2), 81-93.

Knight, W. R. (1966). A Computer Method for Calculating Kendall's Tau with Ungrouped Data. Journal of the American Statistical Association, 61(314), 436-439.

Fieller, E. C., Hartley, H. O., & Pearson, E. S. (1957). Tests for rank correlation coefficients. I. Biometrika, 44(3/4), 470-481.

Brown, M. B., & Benedetti, J. K. (1977). Sampling behavior of tests for correlation in two-way contingency tables. Journal of the American Statistical Association, 72(358), 309-315.

Huang, Z., & Qin, G. (2023). Influence function-based confidence intervals for the Kendall rank correlation coefficient. Computational Statistics, 38(2), 1041-1055.

Croux, C., & Dehon, C. (2010). Influence functions of the Spearman and Kendall correlation measures. Statistical Methods & Applications, 19, 497-515.

See Also

print.kendall_matrix, plot.kendall_matrix

Examples

Run this code
# Basic usage with a matrix
mat <- cbind(a = rnorm(100), b = rnorm(100), c = rnorm(100))
kt <- kendall_tau(mat)
print(kt)
summary(kt)
plot(kt)

# Confidence intervals
kt_ci <- kendall_tau(mat[, 1:3], ci = TRUE)
print(kt_ci, show_ci = "yes")
summary(kt_ci)

# Two-vector mode (scalar path)
x <- rnorm(1000); y <- 0.5 * x + rnorm(1000)
kendall_tau(x, y)

# Including ties
tied_df <- data.frame(
  v1 = rep(1:5, each = 20),
  v2 = rep(5:1, each = 20),
  v3 = rnorm(100)
)
kt_tied <- kendall_tau(tied_df, ci = TRUE, ci_method = "fieller")
print(kt_tied, show_ci = "yes")

# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
  view_corr_shiny(kt)
}

Run the code above in your browser using DataLab