whiten: whitens multivariate data

Description

whiten transforms a multivariate K-dimensional signal $\mathbf{X}$ with mean $\boldsymbol \mu_X$ and covariance matrix $\Sigma_{X}$ to a whitened signal $\mathbf{U}$ with mean $\boldsymbol 0$ and $\Sigma_U = I_K$. Thus it centers the signal and makes it contemporaneously uncorrelated. See Details.

check_whitened checks if data has been whitened; i.e., if it has zero mean, unit variance, and is uncorrelated.

sqrt_matrix computes the square root $\mathbf{B}$ of a square matrix $\mathbf{A}$. The matrix $\mathbf{B}$ satisfies $\mathbf{B} \mathbf{B} = \mathbf{A}$.

Usage

whiten(data)
check_whitened(data, check.attribute.only = TRUE)
sqrt_matrix(mat, return.sqrt.only = TRUE, symmetric = FALSE)

Arguments

data

$n \times K$ array representing n observations of K variables.

check.attribute.only

logical; if TRUE it checks the attribute only. This is much faster (it just needs to look up one attribute value), but it might not surface silent bugs. For sake of performance the package uses the attribute version by default. However, for testing/debugging the full computational version can be used.

mat

a square $K \times K$ matrix.

return.sqrt.only

logical; if TRUE (default) it returns only the square root matrix; if FALSE it returns other auxiliary results (eigenvectors and eigenvalues, and inverse of the square root matrix).

symmetric

logical; if TRUE the eigen-solver assumes that the matrix is symmetric (which makes it much faster). This is in particular useful for a covariance matrix (which is used in whiten). Default: FALSE.

Value

whiten returns a list with the whitened data, the transformation, and other useful quantities.

check_whitened throws an error if the input is not whitened, and returns (invisibly) the data with an attribute 'whitened' equal to TRUE. This allows to simply update data to have the attribute and thus only check it once on the actual data (slow) but then use the attribute lookup (fast).

sqrt_matrix returns an $n \times n$ matrix. If $\mathbf{A}$ is not semi-positive definite it returns a complex-valued $\mathbf{B}$ (since square root of negative eigenvalues are complex).

If return.sqrt.only = FALSE then it returns a list with:

values

eigenvalues of $\mathbf{A}$,

vectors

eigenvectors of $\mathbf{A}$,

sqrt

square root matrix $\mathbf{B}$,

sqrt.inverse

inverse of $\mathbf{B}$.

Details

whiten uses zero component analysis (ZCA) (aka zero-phase whitening filters) to whiten the data; i.e., it uses the inverse square root of the covariance matrix of $\mathbf{X}$ (see sqrt_matrix) as the whitening transformation. This means that on top of PCA, the uncorrelated principal components are back-transformed to the original space using the transpose of the eigenvectors. The advantage is that this makes them comparable to the original $\mathbf{X}$. See References for details.

The square root of a quadratic $n \times n$ matrix $\mathbf{A}$ can be computed by using the eigen-decomposition of $\mathbf{A}$ $$ \mathbf{A} = \mathbf{V} \Lambda \mathbf{V}', $$ where $\Lambda$ is an $n \times n$ matrix with the eigenvalues $\lambda_1, \ldots, \lambda_n$ in the diagonal. The square root is simply $\mathbf{B} = \mathbf{V} \Lambda^{1/2} \mathbf{V}'$ where $\Lambda^{1/2} = diag(\lambda_1^{1/2}, \ldots, \lambda_n^{1/2})$.

Similarly, the inverse square root is defined as $\mathbf{A}^{-1/2} = \mathbf{V} \Lambda^{-1/2} \mathbf{V}'$, where $\Lambda^{-1/2} = diag(\lambda_1^{-1/2}, \ldots, \lambda_n^{-1/2})$ (provided that $\lambda_i \neq 0$).

References

See appendix in http://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf.

See http://ufldl.stanford.edu/wiki/index.php/Implementing_PCA/Whitening.

Examples

Run this code

# NOT RUN {
XX <- matrix(rnorm(100), ncol = 2) %*% matrix(runif(4), ncol = 2)
cov(XX)
UU <- whiten(XX)$U
cov(UU)
# }

Run the code above in your browser using DataLab