Learn R Programming

matrixCorr (version 0.10.0)

polyserial: Polyserial Correlation Between Continuous and Ordinal Variables

Description

Computes polyserial correlations between continuous variables in data and ordinal variables in y. Both pairwise vector mode and rectangular matrix/data-frame mode are supported.

Usage

polyserial(data, y, check_na = TRUE)

# S3 method for polyserial_corr print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )

# S3 method for polyserial_corr plot( x, title = "Polyserial correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, show_value = TRUE, ... )

# S3 method for polyserial_corr summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )

Value

If both data and y are vectors, a numeric scalar. Otherwise a numeric matrix of class polyserial_corr with rows corresponding to the continuous variables in data and columns to the ordinal variables in y. Matrix outputs carry attributes method, description, and package = "matrixCorr".

Arguments

data

A numeric vector, matrix, or data frame containing continuous variables.

y

An ordinal vector, matrix, or data frame containing ordinal variables. Supported columns are factors, ordered factors, logical values, or integer-like numerics.

check_na

Logical (default TRUE). If TRUE, missing values are rejected. If FALSE, pairwise complete cases are used.

x

An object of class polyserial_corr.

digits

Integer; number of decimal places to print.

n

Optional row threshold for compact preview output.

topn

Optional number of leading/trailing rows to show when truncated.

max_vars

Optional maximum number of visible columns; NULL derives this from console width.

width

Optional display width; defaults to getOption("width").

show_ci

One of "yes" or "no".

...

Additional arguments passed to print().

title

Plot title. Default is "Polyserial correlation heatmap".

low_color

Color for the minimum correlation.

high_color

Color for the maximum correlation.

mid_color

Color for zero correlation.

value_text_size

Font size used in tile labels.

show_value

Logical; if TRUE (default), overlay numeric values on the heatmap tiles.

object

An object of class polyserial_corr.

Author

Thiago de Paula Oliveira

Details

The polyserial correlation assumes a latent bivariate normal model between a continuous variable and an unobserved continuous propensity underlying an ordinal variable. Let \((X, Z)^\top \sim N_2(0, \Sigma)\) with \(\mathrm{corr}(X,Z)=\rho\), and suppose the observed ordinal response \(Y\) is formed by cut-points \(-\infty = \beta_0 < \beta_1 < \cdots < \beta_K = \infty\): $$ Y = k \iff \beta_{k-1} < Z \le \beta_k. $$ After standardising the observed continuous variable \(X\), the thresholds are estimated from the marginal proportions of \(Y\). Conditional on an observed \(x_i\), the category probability is $$ \Pr(Y_i = k \mid X_i = x_i, \rho) = \Phi\!\left(\frac{\beta_k - \rho x_i}{\sqrt{1-\rho^2}}\right) - \Phi\!\left(\frac{\beta_{k-1} - \rho x_i}{\sqrt{1-\rho^2}}\right). $$ The returned estimate maximises the log-likelihood $$ \ell(\rho) = \sum_{i=1}^{n}\log \Pr(Y_i = y_i \mid X_i = x_i, \rho) $$ over \(\rho \in (-1,1)\) via a one-dimensional Brent search in C++.

In vector mode a single estimate is returned. In matrix/data-frame mode, every numeric column of data is paired with every ordinal column of y, producing a rectangular matrix of continuous-by-ordinal polyserial correlations.

Computational complexity. If data has \(p_x\) continuous columns and y has \(p_y\) ordinal columns, the matrix path computes \(p_x p_y\) separate one-parameter likelihood optimisations.

References

Olsson, U., Drasgow, F., & Dorans, N. J. (1982). The polyserial correlation coefficient. Psychometrika, 47(3), 337-347.

Examples

Run this code
set.seed(125)
n <- 1000
Sigma <- matrix(c(
  1.00, 0.30, 0.55, 0.20,
  0.30, 1.00, 0.25, 0.50,
  0.55, 0.25, 1.00, 0.40,
  0.20, 0.50, 0.40, 1.00
), 4, 4, byrow = TRUE)

Z <- mnormt::rmnorm(n = n, mean = rep(0, 4), varcov = Sigma)
X <- data.frame(x1 = Z[, 1], x2 = Z[, 2])
Y <- data.frame(
  y1 = ordered(cut(
    Z[, 3],
    breaks = c(-Inf, -0.5, 0.7, Inf),
    labels = c("low", "mid", "high")
  )),
  y2 = ordered(cut(
    Z[, 4],
    breaks = c(-Inf, -1.0, 0.0, 1.0, Inf),
    labels = c("1", "2", "3", "4")
  ))
)

ps <- polyserial(X, Y)
print(ps, digits = 3)
summary(ps)
plot(ps)

Run the code above in your browser using DataLab