bestNormalize (version 0.2.2)

bestNormalize: Calculate and perform best normalizing transformation

Description

Performs a suite of normalizing transformations, and selects the best one on the basis of the Pearson P test statistic for normality. The transformation that has the lowest P (calculated on the transformed data) is selected. See details for more information.

Usage

bestNormalize(x, allow_orderNorm = TRUE)

# S3 method for bestNormalize predict(object, newdata = NULL, inverse = FALSE, ...)

# S3 method for bestNormalize print(x, ...)

Arguments

x

A vector to normalize

allow_orderNorm

set to FALSE if orderNorm should not be applied

object

an object of class 'bestNormalize'

newdata

a vector of data to be (reverse) transformed

inverse

if TRUE, performs reverse transformation

...

additional arguments

Value

A list of class bestNormalize with elements

x.t

transformed original data

x

original data

norm_stats

Pearson's Pearson's P / degrees of freedom

chosen_transform

info about the transformation (of appropriate class)

The predict function returns the numeric value of the transformation performed on new data, and allows for the inverse transformation as well.

Details

bestNormalize estimates the optimal normalizing transformation. This transformation can be performed on new data, and inverted, via the predict function.

This function currently estimates the Yeo-Johnson transformation, the Box Cox transformation (if the data is positive), and the Lambert WxF Gaussianizing transformation of type "s". If allow_orderNorm == TRUE, then the ordered quantile normalization technique is also employed, and will likely be chosen if ties are not present since it essentially forces the data to follow a normal distribution. More information on the orderNorm technique can be found in the package vignette, or using ?orderNorm.

NOTE: Only the Lambert technique of type = "s" (skew) ensures that the transformation is consistently 1-1, so it is the only method currently used in bestNormalize(). Use type = "h" or type = 'hh' at risk of not having this estimate 1-1 transform. These alternative types are effective when the data has exceptionally heavy tails, e.g. the Cauchy distribution.

See Also

boxcox, lambert, orderNorm, yeojohnson

Examples

Run this code
# NOT RUN {
x <- rgamma(100, 1, 1)

BN_obj <- bestNormalize(x)
BN_obj
p <- predict(BN_obj)
x2 <- predict(BN_obj, newdata = p, inverse = TRUE)

all.equal(x2, x)

# }

Run the code above in your browser using DataLab