find_transformation_parameters
is used to find optimal parameters for
univariate transformation to normality.
find_transformation_parameters(
x,
method = "yeo_johnson",
robust = TRUE,
invariant = TRUE,
lambda = c(-4, 6),
empirical_gof_normality_p_value = NULL,
...
)
A transformer object that can be used to transform values.
A vector with numeric values.
One of the following methods for power transformation:
box_cox
: Transformation using the Box-Cox transformation (Box and Cox,
1964). The Box-Cox transformation requires that all data are strictly
positive. Features that contain zero or negative values cannot be
transformed using this transformation. In their work, Box and Cox define a
shifted variant. We use this variant to shift values to a strictly positive
range, when negative values are present. The Box-Cox transformation relies
on a single parameter lambda, which is estimated through maximisation of
the log-likelihood function corresponding to a normal distribution.
yeo_johnson
:Transformation using the Yeo-Johnson
transformation (Yeo and Johnson, 2000). Unlike the Box-Cox transformation,
the Yeo-Johnson transformation allows for negative and positive values.
Like the Box-Cox transformation, this transformation relies on a single
parameter lambda, which is estimated through maximisation of the
log-likelihood function corresponding to a normal distribution.
none
: A fall-back method that will not transform values.
Flag for using a robust version of Box-Cox or Yeo-Johnson transformation, as defined by Raymaekers and Rousseeuw (2021). This version is less sensitive in the presence outliers.
Flag for using a version of Box-Cox or Yeo-Johnson transformation that simultaneously optimises location and scale in addition to the lambda parameter.
Single lambda value, or range of lambda values that should be
considered. Default: c(4.0, 6.0). Can be NULL
to force optimisation
without a constraint in lambda values.
Significance value for the empirical
goodness-of-fit test for central normality. The p-value is computed through
the assess_transformation
function. By setting this parameter to a
numeric value other than NULL
, the transformation will be rejected when
the p-value of the test is below the significance value.
Unused parameters.
Yeo, I. & Johnson, R. A. A new family of power transformations to improve normality or symmetry. Biometrika 87, 954–959 (2000).
Box, G. E. P. & Cox, D. R. An analysis of transformations. J. R. Stat. Soc. Series B Stat. Methodol. 26, 211–252 (1964).
Raymaekers, J., Rousseeuw, P. J. Transforming variables to central normality. Mach Learn. (2021).
x <- exp(stats::rnorm(1000))
transformer <- find_transformation_parameters(
x = x,
method = "box_cox")
Run the code above in your browser using DataLab