Finding Univariate or Multivariate Power Transformations
powerTransform computes members of families of transformations indexed by one
parameter, the Box-Cox power family, or the Yeo and Johnson (2000) family, or the
basic power family, interpreting zero power as logarithmic.
The family can be modified to have Jacobian one, or not, except for the basic
powerTransform(object,...) ## S3 method for class 'default': powerTransform(object,...) ## S3 method for class 'lm': powerTransform(object, ...) ## S3 method for class 'formula': powerTransform(object, data, subset, weights, na.action, ...)
- This can either be an object of class
lm, a formula, or a matrix or vector; see below.
- A data frame or environment, as in
- Case indices to be used, as in
- Weights as in
- Missing value action, as in
- Additional arguments that are passed to
estimateTransform, which does the actual computing, or the
optimfunction, which does the maximization
The function powerTransform is used to estimate normalizing transformations
of a univariate or a multivariate random variable. For a univariate transformation,
a formula like
z~x1+x2+x3 will find estimate a transformation for the response
z from the family of transformations indexed by the parameter
that makes the residuals from the regression of the transformed
z on the predictors
as closed to normally distributed as possible. This generalizes the Box and
Cox (1964) transformations to normality only by allowing for families other than the
power transformations used in that paper.
For a formula like
cbind(y1,y2,y3)~x1+x2+x3, the three variables on
the left-side are all transformed, generally with different transformations
to make all the residuals as close to
normally distributed as possible.
cbind(y1,y2,y3)~1 would specify transformations
to multivariate normality with no predictors. This generalizes the multivariate
power transformations suggested by Velilla (1993) by allowing for different
families of transformations, and by allowing for predictors. Cook and Weisberg (1999)
and Weisberg (2014) suggest the usefulness of transforming
a set of predictors
z1, z2, z3 for multivariate normality and for transforming
for multivariate normality conditional on levels of a factor, which is equivalent
to setting the predictors to be indicator variables for that factor.
Specifying the first argument as a vector, for example
powerTransform(ais$LBM), is equivalent to
powerTransform(LBM ~ 1, ais). Similarly,
powerTransform( cbind(ais$LBM, ais$SSF)), where the first argument is a matrix
rather than a formula is equivalent to
powerTransform(cbind(LBM, SSF) ~ 1, ais).
Two families of power transformations are available.
The bcPower family of scaled power transformations,
for $\lambda$ $\neq$ 0, and
$\log(U)$ if $\lambda =0$.
family="yjPower" then the Yeo-Johnson transformations are used.
This is is Box-Cox transformation of $U+1$ for nonnegative values,
and of $|U|+1$ with parameter $2-\lambda$ for $U$
Other families can be added by writing a function whose first argument is a
matrix or vector to be transformed, and whose second argument is the value of the
transformation parameter. The function must return modified transformations
so that the Jacobian of the transformation is equal to one; see Cook and
powerTransform is a front-end for
testTransform is used to obtain likelihood ratio
any specified value for the transformation parameters. It is used by the
summary method for powerTransform objects.
- The result of
powerTransformis an object of class
powerTransformthat gives the estimates of the the transformation parameters and related statistics. The
summarymethod provides both the estimates, standard errors, marginal Wald confidence intervals and relevant likelihood ratio tests. Several helper functions are available. The
coefmethod returns the estimated transformation parameters, while
coef(object,round=TRUE)will return the transformations rounded to nearby convenient values within 1.96 standard errors of the mle. The
vcovfunction returns the estimated covariance matrix of the estimated transformation parameters. A
summaryto provide more information. By default the summary method calls
testTransformand provides likelihood ratio type tests that all transformation parameters equal one and that all transformation parameters equal zero, for log transformations, and for a convenient rounded value not far from the mle. The function can be called directly to test any other value for $\lambda$.
Box, G. E. P. and Cox, D. R. (1964) An analysis of transformations. Journal of the Royal Statisistical Society, Series B. 26 211-46. Cook, R. D. and Weisberg, S. (1999) Applied Regression Including Computing and Graphics. Wiley. Fox, J. and Weisberg, S. (2011) An R Companion to Applied Regression, Second Edition, Sage. Velilla, S. (1993) A note on the multivariate Box-Cox transformation to normality. Statistics and Probability Letters, 17, 259-263. Weisberg, S. (2014) Applied Linear Regression, Fourth Edition, Wiley. Yeo, I. and Johnson, R. (2000) A new family of power transformations to improve normality or symmetry. Biometrika, 87, 954-959.
# Box Cox Method, univariate summary(p1 <- powerTransform(cycles ~ len + amp + load, Wool)) # fit linear model with transformed response: coef(p1, round=TRUE) summary(m1 <- lm(bcPower(cycles, p1$roundlam) ~ len + amp + load, Wool)) # Multivariate Box Cox summary(powerTransform(cbind(len, ADT, trks, sigs1) ~ 1, Highway1)) # Multivariate transformation to normality within levels of 'hwy' summary(a3 <- powerTransform(cbind(len, ADT, trks, sigs1) ~ hwy, Highway1)) # test lambda = (0 0 0 -1) testTransform(a3, c(0, 0, 0, -1)) # save the rounded transformed values, plot them with a separate # color for each highway type transformedY <- bcPower(with(Highway1, cbind(len, ADT, trks, sigs1)), coef(a3, round=TRUE)) pairs(transformedY, col=as.numeric(Highway1$hwy))