LambertW (version 0.6.7)

# Gaussianize: Gaussianize matrix-like objects

## Description

Gaussianize is probably the most useful function in this package. It works the same way as scale, but instead of just centering and scaling the data, it actually Gaussianizes the data (works well for unimodal data). See Goerg (2011, 2016) and Examples.

Important: For multivariate input X it performs a column-wise Gaussianization (by simply calling apply(X, 2, Gaussianize)), which is only a marginal Gaussianization. This does not mean (and is in general definitely not the case) that the transformed data is then jointly Gaussian.

By default Gaussianize returns the $$X \sim N(\mu_x, \sigma_x^2)$$ input, not the zero-mean, unit-variance $$U \sim N(0, 1)$$ input. Use return.u = TRUE to obtain $$U$$.

## Usage

Gaussianize(
data = NULL,
type = c("h", "hh", "s"),
method = c("IGMM", "MLE"),
return.tau.mat = FALSE,
inverse = FALSE,
tau.mat = NULL,
verbose = FALSE,
return.u = FALSE,
input.u = NULL
)

## Arguments

data

a numeric matrix-like object; either the data that should be Gaussianized; or the data that should ''DeGaussianized'' (inverse = TRUE), i.e., converted back to the original space.

type

what type of non-normality: symmetric heavy-tails "h" (default), skewed heavy-tails "hh", or just skewed "s".

method

what estimator should be used: "MLE" or "IGMM". "IGMM" gives exactly Gaussian characteristics (kurtosis $$\equiv$$ 3 for "h" or skewness $$\equiv$$ 0 for "s"), "MLE" comes close to this. Default: "IGMM" since it is much faster than "MLE".

return.tau.mat

logical; if TRUE it also returns the estimated $$\tau$$ parameters as a matrix (same number of columns as data). This matrix can then be used to Gaussianize new data with pre-estimated $$\tau$$. It can also be used to DeGaussianize'' data by passing it as an argument (tau.mat) to Gaussianize() and set inverse = TRUE.

inverse

logical; if TRUE it performs the inverse transformation using tau.mat to "DeGaussianize" the data back to the original space again.

tau.mat

instead of estimating $$\tau$$ from the data you can pass it as a matrix (usually obtained via Gaussianize(..., return.tau.mat = TRUE)). If inverse = TRUE it uses this tau matrix to DeGaussianize'' the data again. This is useful to back-transform new data in the Gaussianized space, e.g., predictions or fits, back to the original space.

verbose

logical; if TRUE, it prints out progress information in the console. Default: FALSE.

return.u

logical; if TRUE it returns the zero-mean, unit variance Gaussian input. If FALSE (default) it returns the input $$X$$.

input.u

optional; if you used return.u = TRUE in a previous step, and now you want to convert the data back to original space, then you have to pass it as input.u. If you pass numeric data as data, Gaussianize assumes that data is the input corresponding to $$X$$, not $$U$$.

## Value

numeric matrix-like object with same dimension/size as input data. If inverse = FALSE it is the Gaussianize matrix / vector; if TRUE it is the DeGaussianized'' matrix / vector.

The numeric parameters of mean, scale, and skewness/heavy-tail parameters that were used in the Gaussianizing transformation are returned as attributes of the output matrix: 'Gaussianized:mu', 'Gaussianized:sigma', and for

type = "h":

'Gaussianized:delta' & 'Gaussianized:alpha',

type = "hh":

'Gaussianized:delta_l' and 'Gaussianized:delta_r' & 'Gaussianized:alpha_l' and 'Gaussianized:alpha_r',

type = "s":

'Gaussianized:gamma'.

They can also be returned as a separate matrix using return.tau.mat = TRUE. In this case Gaussianize returns a list with elements:

input

Gaussianized input data $$\boldsymbol x$$ (or $$\boldsymbol u$$ if return.u = TRUE),

tau.mat

matrix with $$\tau$$ estimates that we used to get x; has same number of columns as x, and 3, 5, or 6 rows (depending on type='s', 'h', or 'hh').

## Examples

Run this code
# NOT RUN {
# Univariate example
set.seed(20)
y1 <- rcauchy(n = 100)
out <- Gaussianize(y1, return.tau.mat = TRUE)
x1 <- get_input(y1, c(out$tau.mat[, 1])) # same as out$input
test_normality(out$input) # Gaussianized a Cauchy! kStartFrom <- 20 y.cum.avg <- (cumsum(y1)/seq_along(y1))[-seq_len(kStartFrom)] x.cum.avg <- (cumsum(x1)/seq_along(x1))[-seq_len(kStartFrom)] plot(c((kStartFrom + 1): length(y1)), y.cum.avg, type="l" , lwd = 2, main="CLT in practice", xlab = "n", ylab="Cumulative sample average", ylim = range(y.cum.avg, x.cum.avg)) lines(c((kStartFrom+1): length(y1)), x.cum.avg, col=2, lwd=2) abline(h = 0) grid() legend("bottomright", c("Cauchy", "Gaussianize"), col = c(1, 2), box.lty = 0, lwd = 2, lty = 1) plot(x1, y1, xlab="Gaussian-like input", ylab = "Cauchy - output") grid() # } # NOT RUN { # multivariate example y2 <- 0.5 * y1 + rnorm(length(y1)) YY <- cbind(y1, y2) plot(YY) XX <- Gaussianize(YY, type = "hh") plot(XX) out <- Gaussianize(YY, type = "h", return.tau.mat = TRUE, verbose = TRUE, method = "IGMM") plot(out$input)
out$tau.mat YY.hat <- Gaussianize(data = out$input, tau.mat = out\$tau.mat,
inverse = TRUE)
plot(YY.hat[, 1], YY[, 1])
# }
# NOT RUN {
# }


Run the code above in your browser using DataCamp Workspace