Learn R Programming

OptimalBinningWoE (version 1.0.8)

obcorr: Compute Multiple Robust Correlations Between Numeric Variables

Description

This function computes various correlation coefficients between all pairs of numeric variables in a data frame. It implements several classical and robust correlation measures, including Pearson, Spearman, Kendall, Hoeffding's D, Distance Correlation, Biweight Midcorrelation, and Percentage Bend correlation.

Usage

obcorr(df, method = "all", threads = 0L)

Value

A data frame with the following columns:

x, y

Names of the variable pairs being correlated.

pearson

Pearson correlation coefficient.

spearman

Spearman rank correlation coefficient.

kendall

Kendall's tau-b correlation coefficient.

hoeffding

Hoeffding's D statistic (scaled).

distance

Distance correlation coefficient.

biweight

Biweight midcorrelation coefficient.

pbend

Percentage bend correlation coefficient.

The exact columns returned depend on the method parameter.

Arguments

df

A data frame containing numeric variables. Non-numeric columns will be automatically excluded. At least two numeric variables are required.

method

A character string specifying which correlation method(s) to compute. Possible values are:

  • "all": Compute all available correlation methods (default).

  • "pearson": Compute only Pearson correlation.

  • "spearman": Compute only Spearman correlation.

  • "kendall": Compute only Kendall correlation.

  • "hoeffding": Compute only Hoeffding's D.

  • "distance": Compute only distance correlation.

  • "biweight": Compute only biweight midcorrelation.

  • "pbend": Compute only percentage bend correlation.

  • "robust": Compute robust correlations (biweight and pbend).

  • "alternative": Compute alternative correlations (hoeffding and distance).

threads

An integer specifying the number of threads to use for parallel computation. If 0 (default), uses all available cores. Ignored if OpenMP is not available.

Details

The function supports multiple correlation methods simultaneously and utilizes OpenMP for parallel computation when available.

Available correlation methods:

  • Pearson: Standard linear correlation coefficient.

  • Spearman: Rank-based correlation coefficient.

  • Kendall: Kendall's tau-b correlation coefficient.

  • Hoeffding: Hoeffding's D statistic (scaled by 30).

  • Distance: Distance correlation (Székely et al., 2007).

  • Biweight: Biweight midcorrelation (robust alternative).

  • Pbend: Percentage bend correlation (robust alternative).

References

Székely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35(6), 2769-2794.

Wilcox, R.R. (1994). The percentage bend correlation coefficient. Psychometrika, 59(4), 601-616.

Examples

Run this code
# Create sample data
set.seed(123)
n <- 100
df <- data.frame(
  x1 = rnorm(n),
  x2 = rnorm(n),
  x3 = rt(n, df = 3), # Heavy-tailed distribution
  x4 = sample(c(0, 1), n, replace = TRUE), # Binary variable
  category = sample(letters[1:3], n, replace = TRUE) # Non-numeric column
)

# Add some relationships
df$x2 <- df$x1 + rnorm(n, 0, 0.5)
df$x3 <- df$x1^2 + rnorm(n, 0, 0.5)

# Compute all correlations
result_all <- obcorr(df)
head(result_all)

# Compute only robust correlations
result_robust <- obcorr(df, method = "robust")

# Compute only Pearson correlation with 2 threads
result_pearson <- obcorr(df, method = "pearson", threads = 2)

Run the code above in your browser using DataLab