trafos_dimreduction: Dimension-Reduction Transformations for Training or Sampling

Description

Dimension-reduction transformations applied to an input data matrix. Currently on the principal component transformation and its inverse.

Usage

PCA_trafo(x, mu, Gamma, inverse = FALSE, ...)

Arguments

\((n, d)\)-matrix of data (typically before training or after sampling). If inverse = FALSE, then, conceptually, an \((n, d)\)-matrix with \(1\le k \le d\), where \(d\) is the dimension of the original data whose dimension was reduced to \(k\).

if inverse = TRUE, a \(d\)-vector of centers, where \(d\) is the dimension to transform x to.

Gamma

if inverse = TRUE, a \((d, k)\)-matrix with \(k\) at least as large as ncol(x) containing the \(k\) orthonormal eigenvectors of a covariance matrix sorted in decreasing order of their eigenvalues; in other words, the columns of Gamma contain principal axes or loadings. If a matrix with \(k\) greater than ncol(x) is provided, only the first \(k\)-many are considered.

inverse

logical indicating whether the inverse transformation of the principal component transformation is applied.

…

additional arguments passed to the underlying prcomp().

Value

If inverse = TRUE, the transformed data whose rows contain \(\bm{X} = \bm{\mu} + \Gamma \bm{Y}\), where \(Y\) is one row of x. See the details below for the notation.

If inverse = FALSE, a list containing:

PCs:: \((n, d)\)-matrix of principal components.
cumvar:: cumulative variances; the \(j\)th entry provides the fraction of the explained variance of the first \(j\) principal components.
sd:: sample standard deviations of the transformed data.
lambda:: eigenvalues of cov(x).
mu:: \(d\)-vector of centers of x (see also above) typically provided to PCA_trafo(, inverse = TRUE).
Gamma:: \((d, d)\)-matrix of principal axes (see also above) typically provided to PCA_trafo(, inverse = TRUE).

Details

Conceptually, the principal component transformation transforms a vector \(\bm{X}\) to a vector \(\bm{Y}\) where \(\bm{Y} = \Gamma^T(\bm{X}-\bm{\mu})\), where \(\bm{\mu}\) is the mean vector of \(\bm{X}\) and \(\Gamma\) is the \((d, d)\)-matrix whose columns contains the orthonormal eigenvectors of cov(X).

The corresponding (conceptual) inverse transformation is \(\bm{X} = \bm{\mu} + \Gamma \bm{Y}\).

See McNeil et al. (2015, Section 6.4.5).

References

McNeil, A. J., Frey, R., and Embrechts, P. (2015). Quantitative Risk Management: Concepts, Techniques, Tools. Princeton University Press.

Examples

Run this code

# NOT RUN {
## Generate data
library(copula)
set.seed(271)
X <- qt(rCopula(1000, gumbelCopula(2, dim = 10)), df = 3.5)
pairs(X, gap = 0, pch = ".")

## Principal component transformation
PCA <- PCA_trafo(X)
Y <- PCA$PCs
PCA$cumvar[3] # fraction of variance explained by the first 3 principal components
which.max(PCA$cumvar > 0.9) # number of principal components it takes to explain 90%

## Biplot (plot of the first two principal components = data transformed with
## the first two principal axes)
plot(Y[,1:2])

## Transform back and compare
X. <- PCA_trafo(Y, mu = PCA$mu, Gamma = PCA$Gamma, inverse = TRUE)
stopifnot(all.equal(X., X))

## Note: One typically transforms back with only some of the principal axes
X. <- PCA_trafo(Y[,1:3], mu = PCA$mu, # mu determines the dimension to transform to
                Gamma = PCA$Gamma, # must be of dim. (length(mu), k) for k >= ncol(x)
                inverse = TRUE)
stopifnot(dim(X.) == c(1000, 10))
## Note: We (typically) transform back to the original dimension.
pairs(X., gap = 0, pch = ".") # pairs of back-transformed first three PCs
# }

Run the code above in your browser using DataLab