transfo: Robustly fit the Box-Cox or Yeo-Johnson transformation

Description

This function uses reweighted maximum likelihood to robustly fit the Box-Cox or Yeo-Johnson transformation to each variable in a dataset.

Usage

transfo(X, type = "bestObj", robust = TRUE, lambdarange = NULL,
                    prestandardize = TRUE, prescaleBC = F, scalefac = 1,
                    quant = 0.99, nbsteps = 2)

Arguments

A data matrix of dimensions n x d. Its columns are the variables to be transformed.

type

The type of transformation to be fit. Should be one of:

"BC": Box-Cox power transformation. Only works for strictly positive variables. If this type is given but a variable is not strictly positive, the function stops with a message about that variable.
"YJ" Yeo-Johnson power transformation. The data may have positive as well as negative values.
"bestObj" for strictly positive variables both BC and YJ are run, and the solution with lowest objective is kept. On the other variables YJ is run.

robust

if TRUE the Reweighted Maximum Likelihood method is used, which first computes a robust initial estimate of the transformation parameter lambda. If FALSE the classical ML method is used.

lambdarange

range of lambda values that will be optimized over. If NULL, the range goes from -4 to 6.

prestandardize

whether to standardize the variables before the power transformation.For BC the variable is divided by its median. For YJ and robust = TRUE this subtracts its median and divides by its mad (median absolute deviation). For YJ and robust = F this subtracts the mean and divides by the standard deviation.

prescaleBC

for BC only. This standardizes the logarithm of the original variable by subtracting its median and dividing by its mad, after which the exponential function turns the result into a positive variable again.

scalefac

when YJ is fit and prestandardize = TRUE, the standardized data is multiplied by scalefac. When BC is fit and prescaleBC = TRUE the same happens to the standardized log of the original variable.

quant

quantile for determining the weights in the reweighting step (ignored when robust=FALSE).

nbsteps

number of reweighting steps (ignored when robust=FALSE).

Value

A list with components:

lambdahats the estimated transformation parameter for each column of X.
Xt A matrix in which each column is the transformed version of the corresponding column of X.
muhat The estimated location of each column of Xt.
sigmahat The estimated scale of each column of Xt.
Zt Xt poststandardized by the centers in muhat and the scales in sigmahat. Is always provided.
weights The final weights from the reweighting.
ttypes The type of transform used in each column.
objective Value of the (reweighted) maximum likelihood objective function.

References

J. Raymaekers and P.J. Rousseeuw (2020). Transforming variables to central normality. Arxiv: 2005.07946.

Examples

Run this code

# NOT RUN {
# find Box-Cox transformation parameter for lognormal data:
set.seed(123)
x <- exp(rnorm(1000))
transfo.out <- transfo(x, type = "BC")
# estimated parameter:
transfo.out$lambdahat
# value of the objective function:
transfo.out$objective
# the transformed variable:
transfo.out$Xt
# the poststandardized transformed variable:
transfo.out$Zt
# the type of transformation used:
transfo.out$ttypes
# qqplot of the poststandardized transformed variable:
qqnorm(transfo.out$Zt); abline(0,1)

# For more examples, we refer to the vignette:
vignette("TVCN_examples")
# }

Run the code above in your browser using DataLab