Gaussianize: Gaussianize matrix-like objects

Description

Gaussianize is probably the most useful function in this package. It works the same way as scale, but instead of just centering and scaling the data, it actually Gaussianizes the data (works well for unimodal data). See Goerg (2011, 2016) and Examples.

Important: For multivariate input X it performs a column-wise Gaussianization (by simply calling apply(X, 2, Gaussianize)), which is only a marginal Gaussianization. This does not mean (and is in general definitely not the case) that the transformed data is then jointly Gaussian.

By default Gaussianize returns the \(X \sim N(\mu_x, \sigma_x^2)\) input, not the zero-mean, unit-variance \(U \sim N(0, 1)\) input. Use return.u = TRUE to obtain \(U\).

Usage

Gaussianize(
  data = NULL,
  type = c("h", "hh", "s"),
  method = c("IGMM", "MLE"),
  return.tau.mat = FALSE,
  inverse = FALSE,
  tau.mat = NULL,
  verbose = FALSE,
  return.u = FALSE,
  input.u = NULL
)

Value

numeric matrix-like object with same dimension/size as input data. If inverse = FALSE it is the Gaussianize matrix / vector; if TRUE it is the ``DeGaussianized'' matrix / vector.

The numeric parameters of mean, scale, and skewness/heavy-tail parameters that were used in the Gaussianizing transformation are returned as attributes of the output matrix: 'Gaussianized:mu',

'Gaussianized:sigma', and for

type = "h":: 'Gaussianized:delta' & 'Gaussianized:alpha',
type = "hh":: 'Gaussianized:delta_l' and 'Gaussianized:delta_r' & 'Gaussianized:alpha_l' and 'Gaussianized:alpha_r',
type = "s":: 'Gaussianized:gamma'.

They can also be returned as a separate matrix using return.tau.mat = TRUE. In this case Gaussianize returns a list with elements:

input: Gaussianized input data \(\boldsymbol x\) (or \(\boldsymbol u\) if return.u = TRUE),
tau.mat: matrix with \(\tau\) estimates that we used to get x; has same number of columns as x, and 3, 5, or 6 rows (depending on type='s', 'h', or 'hh').

Arguments

data: a numeric matrix-like object; either the data that should be Gaussianized; or the data that should ''DeGaussianized'' (inverse = TRUE), i.e., converted back to the original space.
type: what type of non-normality: symmetric heavy-tails "h" (default), skewed heavy-tails "hh", or just skewed "s".
method: what estimator should be used: "MLE" or "IGMM". "IGMM" gives exactly Gaussian characteristics (kurtosis \(\equiv\) 3 for "h" or skewness \(\equiv\) 0 for "s"), "MLE" comes close to this. Default: "IGMM" since it is much faster than "MLE".
return.tau.mat: logical; if TRUE it also returns the estimated \(\tau\) parameters as a matrix (same number of columns as data). This matrix can then be used to Gaussianize new data with pre-estimated \(\tau\). It can also be used to ``DeGaussianize'' data by passing it as an argument (tau.mat) to Gaussianize() and set inverse = TRUE.
inverse: logical; if TRUE it performs the inverse transformation using tau.mat to "DeGaussianize" the data back to the original space again.
tau.mat: instead of estimating \(\tau\) from the data you can pass it as a matrix (usually obtained via Gaussianize(..., return.tau.mat = TRUE)). If inverse = TRUE it uses this tau matrix to ``DeGaussianize'' the data again. This is useful to back-transform new data in the Gaussianized space, e.g., predictions or fits, back to the original space.
verbose: logical; if TRUE, it prints out progress information in the console. Default: FALSE.
return.u: logical; if TRUE it returns the zero-mean, unit variance Gaussian input. If FALSE (default) it returns the input \(X\).
input.u: optional; if you used return.u = TRUE in a previous step, and now you want to convert the data back to original space, then you have to pass it as input.u. If you pass numeric data as data, Gaussianize assumes that data is the input corresponding to \(X\), not \(U\).

Examples

Run this code


# Univariate example
set.seed(20)
y1 <- rcauchy(n = 100)
out <- Gaussianize(y1, return.tau.mat = TRUE)
x1 <- get_input(y1, c(out$tau.mat[, 1]))  # same as out$input
test_normality(out$input) # Gaussianized a Cauchy!

kStartFrom <- 20
y.cum.avg <- (cumsum(y1)/seq_along(y1))[-seq_len(kStartFrom)]
x.cum.avg <- (cumsum(x1)/seq_along(x1))[-seq_len(kStartFrom)]

plot(c((kStartFrom + 1): length(y1)), y.cum.avg, type="l" , lwd = 2, 
     main="CLT in practice", xlab = "n", 
     ylab="Cumulative sample average", 
     ylim = range(y.cum.avg, x.cum.avg))
lines(c((kStartFrom+1): length(y1)), x.cum.avg, col=2, lwd=2)
abline(h = 0)
grid()
legend("bottomright", c("Cauchy", "Gaussianize"), col = c(1, 2), 
       box.lty = 0, lwd = 2, lty = 1)

plot(x1, y1, xlab="Gaussian-like input", ylab = "Cauchy - output")
grid()
if (FALSE) {
# multivariate example
y2 <- 0.5 * y1 + rnorm(length(y1))
YY <- cbind(y1, y2)
plot(YY)

XX <- Gaussianize(YY, type = "hh")
plot(XX)

out <- Gaussianize(YY, type = "h", return.tau.mat = TRUE, 
                   verbose = TRUE, method = "IGMM")
                   
plot(out$input)
out$tau.mat

YY.hat <- Gaussianize(data = out$input, tau.mat = out$tau.mat,
                      inverse = TRUE)
plot(YY.hat[, 1], YY[, 1])
}

Run the code above in your browser using DataLab