Learn R Programming

ccaPP (version 0.1.1)

corFunctions: Fast implementations of (robust) correlation estimators

Description

Estimate the correlation of two vectors via fast C++ implementations, with a focus on robust and nonparametric methods.

Usage

corPearson(x, y)

corSpearman(x, y, consistent = FALSE)

corKendall(x, y, consistent = FALSE)

corQuadrant(x, y, consistent = FALSE)

corM(x, y, prob = 0.9, initial = c("quadrant", "spearman", "kendall", "pearson"), tol = 1e-06)

Arguments

x,y
numeric vectors.
consistent
a logical indicating whether a consistent estimate at the bivariate normal distribution should be returned (defaults to FALSE).
prob
numeric; probability for the quantile of the $\chi^{2}$ distribution to be used for tuning the Huber loss function (defaults to 0.9).
initial
a character string specifying the starting values for the Huber M-estimator. For "quadrant" (the default), "spearman" or "kendall", the consistent version of the respecive correlation measure is used togethe
tol
a small positive numeric value to be used for determining convergence.

Value

  • The respective correlation estimate.

Details

corPearson estimates the classical Pearson correlation. corSpearman, corKendall and corQuadrant estimate the Spearman, Kendall and quadrant correlation, respectively, which are nonparametric correlation measures that are somewhat more robust. corM estimates the correlation based on a bivariate M-estimator of location and scatter with a Huber loss function, which is sufficiently robust in the bivariate case, but loses robustness with increasing dimension.

The nonparametric correlation measures do not estimate the same population quantities as the Pearson correlation, the latter of which is consistent at the bivariate normal model. Let $\rho$ denote the population correlation at the normal model. Then the Spearman correlation estimates $(6/\pi) \arcsin(\rho/2)$, while the Kendall and quadrant correlation estimate $(2/\pi) \arcsin(\rho)$. Consistent estimates are thus easily obtained by taking the corresponding inverse expressions.

The Huber M-estimator, on the other hand, is consistent at the bivariate normal model.

See Also

ccaGrid, ccaProj, cor

Examples

Run this code
## generate data
library("mvtnorm")
set.seed(1234)  # for reproducibility
sigma <- matrix(c(1, 0.6, 0.6, 1), 2, 2)
xy <- rmvnorm(100, sigma=sigma)
x <- xy[, 1]
y <- xy[, 2]

## compute correlations

# Pearson correlation
corPearson(x, y)

# Spearman correlation
corSpearman(x, y)
corSpearman(x, y, consistent=TRUE)

# Kendall correlation
corKendall(x, y)
corKendall(x, y, consistent=TRUE)

# quadrant correlation
corQuadrant(x, y)
corQuadrant(x, y, consistent=TRUE)

# Huber M-estimator
corM(x, y)

Run the code above in your browser using DataLab