powerTransform: Finding Univariate or Multivariate Power Transformations

Description

powerTransform uses the maximum likelihood-like approach of Box and Cox (1964) to select a transformatiion of a univariate or multivariate response for normality, linearity and/or constant variance. Available families are the default Box-Cox power family, and the Yeo-Johnson and skew power familes that may be useful when a response is not strictly positive. powerTransform passes arguments to estimateTransform, so you may need to include arguments to estimateTransform in a call to powerTransform.

Usage

powerTransform(object, ...)
# S3 method for default
powerTransform(object, family="bcPower", ...)
# S3 method for lm
powerTransform(object, family="bcPower", ...)
# S3 method for formula
powerTransform(object, data, subset, weights, na.action, 
    family="bcPower", ...)
  
# S3 method for lmerMod
powerTransform(object, family="bcPower", lambda=c(-3, 3), 
    gamma=NULL, ...)
  
estimateTransform(X, Y, weights=NULL, family="bcPower", start=NULL,
         method="L-BFGS-B", ...)
         
# S3 method for default
estimateTransform(X, Y, weights=NULL, family="bcPower", start=NULL, 
         method="L-BFGS-B", ...)
         
# S3 method for skewPower
estimateTransform(X, Y, weights=NULL, lambda=c(-3, 3), 
    gamma=NULL, ...)
# S3 method for lmerMod
estimateTransform(object, family="bcPower", lambda=c(-3, 3), 
    start=NULL, method="L-BFGS-B", ...)
# S3 method for skewPowerlmer
estimateTransform(object, lambda=c(-3, +3), 
    gamma=NULL, ...)

Arguments

object

This can either be an object of class lm or lmerMod, a formula, or a matrix or vector; see below.

data

A data frame or environment, as in lm.

subset

Case indices to be used, as in lm.

weights

Weights as in lm.

na.action

Missing value action, as in ‘lm’.

family

The quoted name of a family of transformations. The available options are "bcPower" the default for the Box-Cox power family; "yjpower" for the Yeo-Johnson family, and "skewPower" for the two-parameter skew power family. The families are documented at bcPower and skewPower.

lambda

The range to be considered for the estimate of the power parameter lambda, equal to c(-3, +3) by default. Values of lambda outside the default range is unlikely to be useful in practice.

gamma

The skewPower family has two parameters, adding a location parameter gamma to the power parameter lambda present in most other transformation families. If gamma=NULL then the location parameter will be estimated; if gamma is set to a numeric value, or a numeric vector of positive values equal in length to the number of responses, gamma will be fixed and only the power will be estimated.

...

Additional arguments that are passed to estimateTransform which does the actual computing, or to the optim function, which does the maximization for all the methods except for lmerMod models with the skewPower family. For this case, computing is done using the neldermead function in the noptr package is used.

A matrix or data.frame giving the “right-side variables”, including a column of ones if the intercept is present.

A vector or matrix or data.frame giving the “left-side variables.”

start

Starting values for the computations. The default value of NULL is usually adequate.

method

The computing alogrithm used by optim for the maximization. The default "L-BFGS-B" appears to work well.

Value

An object of class powerTransform or class skewpowerTransform if family="skewPower" that inherits from powerTransfrom is returned, including the components listed below.

Several methods are available for use with powerTransform objects. The coef method returns the estimated transformation parameters, while coef(object, round=TRUE) will return the transformations rounded to nearby convenient values within 1.96 standard errors of the mle, if any exist. The vcov function returns the estimated covariance matrix of the estimated transformation parameters. A print method is used to print the estimates and summary method provides more information including likelihood ratio type tests that all power parameters equal one and that all transformation parameters equal zero, for log transformations, and for a convenient rounded value not far from the mle. In the case of the skew power family, these tests are based on the profile log-likelihood obtained by maximizing over the start parameter, thus treating the start as a nusiance parameter of lesser interest than the pwoer parameter. testTransform can be called directly to test any other value for $\lambda$ or for skew power $\lambda$ and $\gamma$. There is a plot.powerTransform method for plotting the transformed values, and also a contour.skewpowerTransform method to obtain a contour plot of the two-dimensional log-likelihood for the skew power parameters when the response in univariate. Finally, the boxCox method can be used to plot the univariate log-likleihood for the Box-Cox or Yeo-Johnson power families, or the profile log-likelihood of each of the parameters in the skew power family.

The components of the returned object are

value

The value of the loglikelihood at the mle.

counts

See optim.

convergence

See optim.

message

See optim.

hessian

The hessian matrix.

start

Starting values for the computations.

lambda

The ml estimate for the power parameter

gamma

The ml estimate for the start parameter for the skew power family

roundlam

Convenient rounded values for the estimates. These rounded values will often be the desirable transformations.

family

The transformation family

xqr

QR decomposition of the predictor matrix.

The responses to be transformed

The predictors

weights

The weights if weighted least squares.

Details

The function powerTransform is used to estimate normalizing/linearizing/variance stabilizing transformations of a univariate or a multivariate response in a linear regression. For a univariate response, a formula like z~x1+x2+x3 will estimate a transformation for the response z from a family of transformations indexed by one parameter for Box-Cox and Yeo-Johnson transformations, or two parameters for the skew power family, that makes the residuals from the regression of the transformed z on the predictors as closed to normally distributed as possible.

For a formula like cbind(y1,y2,y3)~x1+x2+x3, the three variables on the left-side are all transformed, generally with different transformations to make all the residuals as close to normally distributed as possible. This is not the same as three univariate transformations becuase the variables transformed are allowed to be correlated. cbind(y1,y2,y3)~1 would specify transformations to multivariate normality with no predictors. This generalizes the multivariate power transformations suggested by Velilla (1993) by allowing for different families of transformations, and by allowing for predictors. Cook and Weisberg (1999) and Weisberg (2014) suggest the usefulness of transforming a set of predictors z1, z2, z3 for multivariate normality and for transforming for multivariate normality conditional on levels of a factor, which is equivalent to setting the predictors to be indicator variables for that factor.

Specifying the first argument as a vector, for example powerTransform(ais$LBM), is equivalent to powerTransform(LBM ~ 1, ais). Similarly, powerTransform(cbind(ais$LBM, ais$SSF)), where the first argument is a matrix rather than a formula is equivalent to specification of a mulitvariate linear model powerTransform(cbind(LBM, SSF) ~ 1, ais).

Three families of power transformations are available. The Box-Cox pwoer family of power transformations, family="bcPower", equals $(U^{\lambda}-1)/\lambda$ for $\lambda$ $\neq$ 0, and $\log(U)$ if $\lambda =0$. A scaled version of this transformation is used in computing with all the families to make the Jacobian of the transformation equal to 1.

If family="yjPower" then the Yeo-Johnson transformations are used. This is is Box-Cox transformation of $U+1$ for nonnegative values, and of $|U|+1$ with parameter $2-\lambda$ for $U$ negative.

If family="skewPower" then the skew power family of transformations suggested by Hawkins and Weisberg (2015) is used. This is a two-parameter family that would generally be applied with a response with occasional negative values; see skewPower for the details and examples. This family has a power parameter $\lambda$ and a non-negative start parameter $\gamma$, with $\gamma = 0$ equal to the Box-Cox transformation.

The same generally methodology can be applied for linear mixed models fit with the lmer function in the lme4 package. A multivariate response is not permitted.

The function testTransform is used to obtain likelihood ratio tests for any specified value for the transformation parameter(s).

References

Box, G. E. P. and Cox, D. R. (1964) An analysis of transformations. Journal of the Royal Statisistical Society, Series B. 26 211-46.

Cook, R. D. and Weisberg, S. (1999) Applied Regression Including Computing and Graphics. Wiley.

Fox, J. and Weisberg, S. (2011) An R Companion to Applied Regression, Second Edition, Sage.

Hawkins, D. and Weisberg, S. (2015) Combining the Box-Cox Power and Genralized Log Transformations to Accomodate Negative Responses, submitted for publication.

Velilla, S. (1993) A note on the multivariate Box-Cox transformation to normality. Statistics and Probability Letters, 17, 259-263.

Weisberg, S. (2014) Applied Linear Regression, Fourth Edition, Wiley.

Yeo, I. and Johnson, R. (2000) A new family of power transformations to improve normality or symmetry. Biometrika, 87, 954-959.

Examples

Run this code

# Box Cox Method, univariate
summary(p1 <- powerTransform(cycles ~ len + amp + load, Wool))

# fit linear model with transformed response:
coef(p1, round=TRUE)
summary(m1 <- lm(bcPower(cycles, p1$roundlam) ~ len + amp + load, Wool))

# Multivariate Box Cox uses Highway1 data
summary(powerTransform(cbind(len, adt, trks, sigs1) ~ 1, Highway1))

# Multivariate transformation to normality within levels of 'htype'
summary(a3 <- powerTransform(cbind(len, adt, trks, sigs1) ~ htype, Highway1))

# test lambda = (0 0 0 -1)
testTransform(a3, c(0, 0, 0, -1))

# save the rounded transformed values, plot them with a separate
# color for each highway type
transformedY <- bcPower(with(Highway1, cbind(len, adt, trks, sigs1)),
                coef(a3, round=TRUE))
scatterplotMatrix( ~ transformedY|htype, Highway1)

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

Details

References

See Also

Examples