stdCoeff: Standardised Coefficients

Description

Calculate fully standardised model coefficients in standard deviation units, adjusted for multicollinearity among predictors.

Usage

stdCoeff(mod, weights = NULL, data = NULL, term.names = NULL,
  cen.x = TRUE, cen.y = TRUE, std.x = TRUE, std.y = TRUE,
  unique.x = TRUE, r.squared = FALSE, ...)

Arguments

mod

A fitted model object of class "lm", "glm", or "merMod", or a list or nested list of such objects.

weights

An optional numeric vector of weights to use for model averaging, or a named list of such vectors. The former should be supplied when mod is a list, and the latter when it is a nested list (with matching list names). If set to "equal", a simple average is calculated instead.

data

An optional dataset used to first re-fit the model(s).

term.names

An optional vector of term names used to extract and/or sort coefficients from the output.

cen.x, cen.y

Logical, whether the intercept and coefficients should be calculated using mean-centred variables.

std.x, std.y

Logical, whether coefficients should be scaled by the standard deviations of variables.

unique.x

Logical, whether coefficients should be adjusted for multicollinearity among predictors.

r.squared

Logical, whether R-squared values should also be returned.

...

Arguments to R2.

Value

A numeric vector of the standardised coefficients, or a list or nested list of such vectors.

Details

stdCoeff will calculate fully standardised coefficients in standard deviation units for linear, generalised linear, and mixed models. It achieves this via adjusting the 'raw' model coefficients, so no standardisation of input variables is required beforehand. Users can simply specify the model with all variables in their original units and the function will do the rest. However, the user is free to scale and/or centre any input variables should they choose, which should not affect the outcome of standardisation (provided any scaling is by standard deviations). This may be desirable in some cases, such as to increase numerical stability during model fitting when variables are on widely different scales.

If arguments cen.x or cen.y are TRUE, model estimates will be calculated as if all predictors (x) and/or the response variable (y) were mean-centred prior to model-fitting. Thus, for an ordinary linear model where centring of x and y is specified, the intercept will be zero - the mean (or weighted mean) of y. In addition, if cen.x = TRUE and there are interacting terms in the model, all coefficients for lower order terms of the interaction are adjusted using an expression which ensures that each main effect or lower order term is estimated at the mean values of the terms they interact with (zero in a 'centred' model) - typically improving the interpretation of coefficients. The expression used comprises a weighted sum of all the coefficients that contain the lower order term, with the weight for the term itself being zero and those for 'containing' terms being the product of the means of the other variables involved in that term (i.e. those not in the lower order term itself). For example, for a three-way interaction (x1 * x2 * x3), the expression for main effect $\beta1$ would be:

$$\beta_{1} + \beta_{12} \bar{x}_{2} + \beta_{13} \bar{x}_{3} + \beta_{123} \bar{x}_{2} \bar{x}_{3}$$ (adapted from here)

In addition, if std.x = TRUE or unique.x = TRUE (see below), product terms for interactive effects will be recalculated using mean-centred variables, to ensure that standard deviations and variance inflation factors (VIF) for predictors are calculated correctly (the model must be re-fit for this latter purpose, to recalculate the variance-covariance matrix).

If std.x = TRUE, coefficients are standardised by multiplying by the standard deviations of predictor variables (or terms), while if std.y = TRUE they are divided by the standard deviation of the response. If the model is a GLM, this latter is calculated using the link-transformed response (or its estimate) generated using the function getY. If both arguments are true, the coefficients are regarded as 'fully' standardised in the traditional sense, often referred to as 'betas'.

If unique.x = TRUE (default), coefficients are adjusted for multicollinearity among predictors by dividing by the square root of the VIF's (Dudgeon 2016, Thompson et al. 2017). If they have also been standardised by the standard deviations of x and y, this converts them to semipartial correlations, i.e. the correlation between the unique components of predictors (residualised on other predictors) and the response variable. This measure of effect size is arguably much more interpretable and useful than the traditional standardised coefficient, as it is always estimated independent of other predictors and so can more readily be compared both within and across models. Values range from zero (no effect) to +/-1 (perfect relationship), rather than from zero to +/- infinity (as in the case of betas) - putting them on the same scale as the bivariate correlation between predictor and response. In the case of GLM's however, the measure is analogous but not exactly equal to the semipartial correlation, so its values may not be always be bound between +/-1 (such cases are likely rare). Crucially, for ordinary linear models, the square of the semipartial correlation equals the increase in R-squared when that variable is added last in the model - directly linking the measure to model fit and 'variance explained'. See here for additional arguments in favour of the use of semipartial correlations.

If r.squared = TRUE, R-squared values are also returned via the R2 function.

Finally, if weights are specified, the function calculates a weighted average of the standardised coefficients across models (Burnham & Anderson 2002).

References

Burnham, K. P., & Anderson, D. R. (2002). Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (2nd ed.). New York: Springer-Verlag. Retrieved from https://www.springer.com/gb/book/9780387953649

Dudgeon, P. (2016). A Comparative Investigation of Confidence Intervals for Independent Variables in Linear Regression. Multivariate Behavioral Research, 51(2-3), 139-153. https://doi.org/gfww3f

Thompson, C. G., Kim, R. S., Aloe, A. M., & Becker, B. J. (2017). Extracting the Variance Inflation Factor and Other Multicollinearity Diagnostics from Typical Regression Results. Basic and Applied Social Psychology, 39(2), 81-90. https://doi.org/gfww2w

Examples

Run this code

# NOT RUN {
library(lme4)

## Standardised coefficients for SEM (i.e. direct effects)
m <- Shipley.SEM
stdCoeff(m)
stdCoeff(m, std.y = FALSE)  # x-only
stdCoeff(m, std.x = FALSE, std.y = FALSE)  # centred only
stdCoeff(m, cen.x = FALSE, cen.y = FALSE)  # standardised only
stdCoeff(m, r.squared = TRUE)  # add R-squared

## Demonstrate equality with manually-standardised variables (gaussian)
m <- Shipley.Growth[[3]]
d <- data.frame(scale(na.omit(Shipley)))
b1 <- stdCoeff(m, unique.x = FALSE)
b2 <- coef(summary(update(m, data = d)))[, 1]
stopifnot(all.equal(b1, b2))

## Demonstrate equality with increment in R-squared (ordinary linear model)
m <- lm(Growth ~ Date + DD + lat, data = Shipley)
r2 <- summary(m)$r.squared
b1 <- stdCoeff(m)[-1]
bn <- names(b1)
b2 <- sqrt(sapply(bn, function(i) {
  f <- reformulate(bn[!bn %in% i])
  r2i <- summary(update(m, f))$r.squared
  r2 - r2i
}))
stopifnot(all.equal(b1, b2))

## Model-averaged standardised coefficients
m <- Shipley.Growth  # candidate models
w <- runif(length(m), 0, 1)  # weights
stdCoeff(m, w)
# }

Run the code above in your browser using DataLab