The functions lm_betaselect()
and glm_betaselect()
let users
select which variables to be
standardized when computing the
standardized solution. They have the
following features:
They automatically skip categorical
predictors (i.e., factor or string
variables).
They do not standardize a product
term, which is incorrect. Instead,
they
compute the product term with its
component variables standardized,
if requested.
They standardize the selected
variables before fitting a model.
Therefore, If a model has the term
log(x) and x is one of the
selected variables, the model used
the logarithm of the standardized
x in the model, instead of
standardized log(x) which is
difficult to interpret.
They can be used to generate
nonparametric
bootstrap confidence intervals for
the standardized solution. Bootstrap
confidence interval is better than
the default confidence interval
ignoring the standardization
because it
takes into account the sampling
variance of the standard deviations.
Preliminary support for bootstrap
confidence has been found
for forming confidence intervals for
coefficients involving standardized
variables in linear regression
(Jones & Waller, 2013).
Problems With Common Approaches
In some regression programs, users
have limited control on which
variables to standardize when
requesting the so-called "betas".
The solution may be uninterpretable
or misleading in these conditions:
Dummy variables are standardized
and their coefficients cannot be interpreted as the
difference between two groups on the
outcome variables.
Product terms (interaction terms)
are standardized and they cannot be
interpreted as the changes in the
effects of focal variables when the
moderators change (Cheung, Cheung,
Lau, Hui, & Vong, 2022).
Variables with meaningful units can
be more difficult to interpret when
they are standardized (e.g., age).
How The Function Work
They standardize the original variables
before they are used in the
model. Therefore, strictly
speaking, they do not standardize
the predictors in model,
but standardize the input variable
(Gelman et al., 2021).
The requested model is then fitted to
the dataset with selected variables
standardized. For the ease of
follow-up analysis, both the results
with selected variables standardized
and the results without
standardization are stored. If
required, the results without
standardization can be retrieved
by raw_output().
Methods
The output of lm_betaselect() is
an lm_betaselect-class object,
and the output of glm_betaselect()
is a glm_betaselect-class object.
They have the following methods:
A coef-method for extracting
the coefficients of the model.
(See coef.lm_betaselect()
and coef.glm_betaselect()
for details.)
A vcov-method for extracting the
variance-covariance matrix of the
estimates of the coefficients.
If bootstrapping is requested, it
can return the matrix based on the
bootstrapping estimates.
(See vcov.lm_betaselect()
and vcov.glm_betaselect()
for details.)
A confint-method for forming the
confidence intervals of the
estimates of the coefficients.
If bootstrapping is requested, it
can return the bootstrap confidence
intervals.
(See confint.lm_betaselect() and
confint.glm_betaselect()
for details.)
A summary-method for printing the
summary of the results, with additional
information such as the number of
bootstrap samples and which variables
have been standardized.
(See summary.lm_betaselect() and
summary.glm_betaselect()
for details.)
An anova-method for printing the
ANOVA table. Can also be used to
compare two or more outputs of
lm_betaselect() or
glm_betaselect()
(See anova.glm_betaselect()
and anova.glm_betaselect()
for details.)
A predict-method for computing
predicted values. It can be used to
compute the predicted values given
a set of new unstandardized data.
The data will be standardized before
computing the predicted values in
the models with standardization.
(See predict.lm_betaselect() and
predict.glm_betaselect()
for details.)
The default update-method for updating
a call also works for an
lm_betaselect object or
a glm_betaselect() object. It can
update the model in the same
way it updates a model fitted by
stats::lm() or stats::glm(),
and also update
the arguments of lm_betaselect()
or glm_betaselect()
such as the variables to be
standardized.
(See stats::update() for details.)
Most other methods for the output
of stats::lm() and stats::glm()
should also work
on an lm_betaselect-class object
or a glm_betaselect-class object,
respectively.
Some of them will give the same
results regardless of the variables
standardized. Examples are
rstandard() and cooks.distance().
For some others, they should be used
with cautions if they make use of
the variance-covariance matrix
of the estimates.
To use the methods for lm objects
or glm objects
on the results without standardization,
simply use raw_output(). For example,
to get the fitted values without
standardization, call
fitted(raw_output(x)), where x
is the output of lm_betaselect()
or glm_betaselect().
The function raw_output() simply extracts
the regression output by stats::lm()
or stats::glm()
on the variables without standardization.