Learn R Programming

Fast Probabilistic Whitening Transformation for Ultra-High Dimensional Data

Data whitening is a widely used preprocessing step to remove correlation structure since statistical models often assume independence (Kessy, et al. 2018). The typical procedures transforms the observed data by an inverse square root of the sample correlation matrix (Figure 1). For low dimension data (i.e. $n > p$), this transformation produces transformed data with an identity sample covariance matrix. This procedure assumes either that the true covariance matrix is know, or is well estimated by the sample covariance matrix. Yet the use of the sample covariance matrix for this transformation can be problematic since 1) the complexity is $\mathcal{O}(p^3)$ and 2) it is not applicable to the high dimensional (i.e. $n \ll p$) case since the sample covariance matrix is no longer full rank.

Here we use a probabilistic model of the observed data to apply a whitening transformation. Our Gaussian Inverse Wishart Empirical Bayes (GIW-EB) 1) model substantially reduces computational complexity, and 2) regularizes the eigen-values of the sample covariance matrix to improve out-of-sample performance.

Installation

devtools::install_github("GabrielHoffman/decorrelate")

Copy Link

Version

Install

install.packages('decorrelate')

Version

0.1.6.4

License

Artistic-2.0

Maintainer

Gabriel E Hoffman

Last Published

July 18th, 2025

Functions in decorrelate (0.1.6.4)

fastcca

Fast canonical correlation analysis
getCov

Get full covariance/correlation matrix from eclairs
rmvnorm_eclairs

Draw from multivariate normal and t distributions
lm_eclairs

Fit linear model after decorrelating
logDet

Evaluate the log determinant
fastcca-class

Class fastcca
eclairs_sq

Compute eclairs decomp of squared correlation matrix
plot,eclairs-method

Plot eclairs object
quadForm

Evaluate quadratic form
reform_decomp

Recompute eclairs after dropping features
optimal_SVHT_coef

Optimal Hard Threshold for Singular Values
averageCorr

Summarize correlation matrix
whiten

Decorrelation projection + eclairs
sv_threshold

Singular value thresholding
kappa,eclairs-method

Compute condition number
lm_each_eclairs

Fit linear model on each feature after decorrelating
eclairs

Estimate covariance/correlation with low rank and shrinkage
cca

Canonical correlation analysis
eclairs-class

Class eclairs
decorrelate

Decorrelation projection
eclairs_corMat

Estimate covariance/correlation with low rank and shrinkage
autocorr.mat

Create auto-correlation matrix
dmult

Multiply by diagonal matrix
cov_transform

Estimate covariance matrix after applying transformation
mahalanobisDistance

Mahalanobis Distance
mult_eclairs

Multiply by eclairs matrix
getWhiteningMatrix

Get whitening matrix
getShrinkageParams

Estimate shrinkage parameter by empirical Bayes