Last chance! 50% off unlimited learning
Sale ends in
Calculate fully standardised model coefficients in standard deviation units, adjusted for multicollinearity among predictors.
stdCoeff(
mod,
weights = NULL,
data = NULL,
term.names = NULL,
cen.x = TRUE,
cen.y = TRUE,
std.x = TRUE,
std.y = TRUE,
unique.x = TRUE,
refit.x = TRUE,
r.squared = FALSE,
...
)
A fitted model object, or a list or nested list of such objects.
An optional numeric vector of weights to use for model
averaging, or a named list of such vectors. The former should be supplied
when mod
is a list, and the latter when it is a nested list (with
matching list names). If set to "equal"
, a simple average is
calculated instead.
An optional dataset used to first re-fit the model(s).
An optional vector of term names used to extract and/or sort coefficients from the output.
Logical, whether the intercept and coefficients should be calculated using mean-centred variables.
Logical, whether coefficients should be scaled by the standard deviations of variables.
Logical, whether coefficients should be adjusted for multicollinearity among predictors.
Logical, whether the model should be re-fit with centred predictors.
Logical, whether R-squared values should also be returned.
Arguments to R2
.
A numeric vector of the standardised coefficients, or a list or nested list of such vectors.
stdCoeff
will calculate fully standardised coefficients in
standard deviation units for a fitted model or list of models. It achieves
this via adjusting the 'raw' model coefficients, so no standardisation of
input variables is required beforehand. Users can simply specify the model
with all variables in their original units and the function will do the
rest. However, the user is free to scale and/or centre any input variables
should they choose, which should not affect the outcome of standardisation
(provided any scaling is by standard deviations). This may be desirable in
some cases, such as to increase numerical stability during model fitting
when variables are on widely different scales.
If arguments cen.x
or cen.y
are TRUE
, model estimates
will be calculated as if all predictors (x) and/or the response variable
(y) were mean-centred prior to model-fitting (including any dummy variables
arising from categorical predictors). Thus, for an ordinary linear model
where centring of x and y is specified, the intercept will be zero - the
mean (or weighted mean) of y. In addition, if cen.x = TRUE
and there
are interacting terms in the model, all coefficients for lower order terms
of the interaction are adjusted using an expression which ensures that each
main effect or lower order term is estimated at the mean values of the
terms they interact with (zero in a 'centred' model) - typically improving
the interpretation of coefficients. The expression used comprises a
weighted sum of all the coefficients that contain the lower order term,
with the weight for the term itself being zero and those for 'containing'
terms being the product of the means of the other variables involved in
that term (i.e. those not in the lower order term itself). For example, for
a three-way interaction (x1 * x2 * x3), the expression for main effect
In addition, if std.x = TRUE
or unique.x = TRUE
(see below),
product terms for interactive effects will be recalculated using
mean-centred variables, to ensure that standard deviations and variance
inflation factors (VIF) for predictors are calculated correctly (the model
must be re-fit for this latter purpose, to recalculate the
variance-covariance matrix).
If std.x = TRUE
, coefficients are standardised by multiplying by the
standard deviations of predictor variables (or terms), while if std.y
= TRUE
they are divided by the standard deviation of the response. If the
model is a GLM, this latter is calculated using the link-transformed
response (or an estimate of same) generated using the function getY
.
If both arguments are true, the coefficients are regarded as 'fully'
standardised in the traditional sense, often referred to as 'betas'.
If unique.x = TRUE
(default), coefficients are adjusted for
multicollinearity among predictors by dividing by the square root of the
VIFs (Dudgeon 2016, Thompson et al. 2017). If they have also been
standardised by the standard deviations of x and y, this converts them to
semipartial correlations, i.e. the correlation between the unique
components of predictors (residualised on other predictors) and the
response variable. This measure of effect size is arguably much more
interpretable and useful than the traditional standardised coefficient, as
it is always estimated independent of other predictors and so can more
readily be compared both within and across models. Values range from zero
to +/-1 rather than +/- infinity (as in the case of betas) - putting them
on the same scale as the bivariate correlation between predictor and
response. In the case of GLMs however, the measure is analogous but not
exactly equal to the semipartial correlation, so its values may not always
be bound between +/-1 (such cases are likely rare). Crucially, for ordinary
linear models, the square of the semipartial correlation equals the
increase in R-squared when that variable is added last in the model -
directly linking the measure to model fit and 'variance explained'. See
here
for additional arguments in favour of the use of semipartial correlations.
If refit.x = TRUE
, the model will be re-fit with any (newly-)centred
continuous predictors. This will occur (and will normally be desired) when
cen.x
and unique.x
are TRUE
and there are interaction
terms in the model, in order to calculate correct VIFs from the var-cov
matrix. However, re-fitting may not be necessary in some cases - for
example where predictors have already been centred (and whose values will
not subsequently be resampled during bootstrapping) - and disabling this
option may save time with larger models and/or bootstrap runs.
If r.squared = TRUE
, R-squared values are also returned via the
R2
function.
Finally, if weights
are specified, the function calculates a
weighted average of the standardised coefficients across models (Burnham &
Anderson 2002).
Burnham, K. P., & Anderson, D. R. (2002). Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (2nd ed.). New York: Springer-Verlag. Retrieved from https://www.springer.com/gb/book/9780387953649
Dudgeon, P. (2016). A Comparative Investigation of Confidence Intervals for Independent Variables in Linear Regression. Multivariate Behavioral Research, 51(2-3), 139-153. https://doi.org/gfww3f
Thompson, C. G., Kim, R. S., Aloe, A. M., & Becker, B. J. (2017). Extracting the Variance Inflation Factor and Other Multicollinearity Diagnostics from Typical Regression Results. Basic and Applied Social Psychology, 39(2), 81-90. https://doi.org/gfww2w
# NOT RUN {
library(lme4)
## Standardised coefficients for SEM (i.e. direct effects)
m <- Shipley.SEM
stdCoeff(m)
stdCoeff(m, std.y = FALSE) # x-only
stdCoeff(m, std.x = FALSE, std.y = FALSE) # centred only
stdCoeff(m, cen.x = FALSE, cen.y = FALSE) # standardised only
stdCoeff(m, r.squared = TRUE) # add R-squared
## Demonstrate equality with manually-standardised variables (gaussian)
# m <- Shipley.Growth[[3]]
m <- lm(Growth ~ Date + DD + lat, data = Shipley)
d <- data.frame(scale(na.omit(Shipley)))
b1 <- stdCoeff(m, unique.x = FALSE)
b2 <- coef(summary(update(m, data = d)))[, 1]
stopifnot(all.equal(b1, b2))
## Demonstrate equality with increment in R-squared (ordinary linear model)
m <- lm(Growth ~ Date + DD + lat, data = Shipley)
r2 <- summary(m)$r.squared
b1 <- stdCoeff(m)[-1]
bn <- names(b1)
b2 <- sqrt(sapply(bn, function(i) {
f <- reformulate(bn[!bn %in% i])
r2i <- summary(update(m, f))$r.squared
r2 - r2i
}))
stopifnot(all.equal(b1, b2))
## Model-averaged standardised coefficients
m <- Shipley.Growth # candidate models
w <- runif(length(m), 0, 1) # weights
stdCoeff(m, w)
# }
Run the code above in your browser using DataLab