Learn R Programming

matrixCorr (version 0.10.0)

polychoric: Pairwise Polychoric Correlation

Description

Computes the polychoric correlation for either a pair of ordinal variables or all pairwise combinations of ordinal columns in a matrix/data frame.

Usage

polychoric(data, y = NULL, correct = 0.5, check_na = TRUE)

# S3 method for polychoric_corr print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )

# S3 method for polychoric_corr plot( x, title = "Polychoric correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, show_value = TRUE, ... )

# S3 method for polychoric_corr summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )

Value

If y is supplied, a numeric scalar with attributes diagnostics and thresholds. Otherwise a symmetric matrix of class polychoric_corr with attributes method, description, package = "matrixCorr", diagnostics, thresholds, and correct.

Arguments

data

An ordinal vector, matrix, or data frame. Supported columns are factors, ordered factors, logical values, or integer-like numerics. In matrix/data-frame mode, only supported ordinal columns are retained.

y

Optional second ordinal vector. When supplied, the function returns a single polychoric correlation estimate.

correct

Non-negative continuity correction added to zero-count cells. Default is 0.5.

check_na

Logical (default TRUE). If TRUE, missing values are rejected. If FALSE, pairwise complete cases are used.

x

An object of class polychoric_corr.

digits

Integer; number of decimal places to print.

n

Optional row threshold for compact preview output.

topn

Optional number of leading/trailing rows to show when truncated.

max_vars

Optional maximum number of visible columns; NULL derives this from console width.

width

Optional display width; defaults to getOption("width").

show_ci

One of "yes" or "no".

...

Additional arguments passed to print().

title

Plot title. Default is "Polychoric correlation heatmap".

low_color

Color for the minimum correlation.

high_color

Color for the maximum correlation.

mid_color

Color for zero correlation.

value_text_size

Font size used in tile labels.

show_value

Logical; if TRUE (default), overlay numeric values on the heatmap tiles.

object

An object of class polychoric_corr.

Author

Thiago de Paula Oliveira

Details

The polychoric correlation generalises the tetrachoric model to ordered categorical variables with more than two levels. It assumes latent standard-normal variables \(Z_1, Z_2\) with correlation \(\rho\), and cut-points \(-\infty = \alpha_0 < \alpha_1 < \cdots < \alpha_R = \infty\) and \(-\infty = \beta_0 < \beta_1 < \cdots < \beta_C = \infty\) such that $$ X = r \iff \alpha_{r-1} < Z_1 \le \alpha_r, \qquad Y = c \iff \beta_{c-1} < Z_2 \le \beta_c. $$ For an observed \(R \times C\) contingency table with counts \(n_{rc}\), the thresholds are estimated from the marginal cumulative proportions: $$ \alpha_r = \Phi^{-1}\!\Big(\sum_{k \le r} P(X = k)\Big), \qquad \beta_c = \Phi^{-1}\!\Big(\sum_{k \le c} P(Y = k)\Big). $$ Holding those thresholds fixed, the log-likelihood for the latent correlation is $$ \ell(\rho) = \sum_{r=1}^{R}\sum_{c=1}^{C} n_{rc} \log \Pr\!\big( \alpha_{r-1} < Z_1 \le \alpha_r,\; \beta_{c-1} < Z_2 \le \beta_c \mid \rho \big), $$ and the estimator returned is the maximiser over \(\rho \in (-1,1)\). The C++ implementation performs a dense one-dimensional search followed by Brent refinement.

The argument correct adds a non-negative continuity correction to empty cells before marginal threshold estimation and likelihood evaluation. This avoids numerical failures for sparse tables with structurally zero cells. When correct = 0 and zero cells are present, the corresponding fit can be boundary-driven rather than a regular interior maximum-likelihood problem. The returned object stores sparse-fit diagnostics and the thresholds used for estimation so those cases can be inspected explicitly.

In matrix/data-frame mode, all pairwise polychoric correlations are computed between supported ordinal columns. Diagonal entries are 1 for non-degenerate columns and NA when a column has fewer than two observed levels.

Computational complexity. For \(p\) ordinal variables, the matrix path evaluates \(p(p-1)/2\) bivariate likelihoods. Each pair optimises a single scalar parameter \(\rho\), so the main cost is repeated evaluation of bivariate normal rectangle probabilities.

References

Olsson, U. (1979). Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika, 44(4), 443-460.

Examples

Run this code
set.seed(124)
n <- 1200
Sigma <- matrix(c(
  1.00, 0.60, 0.40,
  0.60, 1.00, 0.50,
  0.40, 0.50, 1.00
), 3, 3, byrow = TRUE)

Z <- mnormt::rmnorm(n = n, mean = rep(0, 3), varcov = Sigma)
Y <- data.frame(
  y1 = ordered(cut(
    Z[, 1],
    breaks = c(-Inf, -0.7, 0.4, Inf),
    labels = c("low", "mid", "high")
  )),
  y2 = ordered(cut(
    Z[, 2],
    breaks = c(-Inf, -1.0, -0.1, 0.8, Inf),
    labels = c("1", "2", "3", "4")
  )),
  y3 = ordered(cut(
    Z[, 3],
    breaks = c(-Inf, -0.4, 0.2, 1.1, Inf),
    labels = c("A", "B", "C", "D")
  ))
)

pc <- polychoric(Y)
print(pc, digits = 3)
summary(pc)
plot(pc)

# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
  view_corr_shiny(pc)
}

# latent Pearson correlations used to generate the ordinal variables
round(stats::cor(Z), 2)

Run the code above in your browser using DataLab