Perform (robust) mediation analysis via a (fast-and-robust) bootstrap test or Sobel's test.
test_mediation(object, ...)# S3 method for formula
test_mediation(
formula,
data,
test = c("boot", "sobel"),
alternative = c("twosided", "less", "greater"),
R = 5000,
level = 0.95,
type = c("perc", "bca"),
order = c("first", "second"),
method = c("regression", "covariance"),
robust = TRUE,
family = "gaussian",
contrast = FALSE,
fit_yx = TRUE,
control = NULL,
...
)
# S3 method for default
test_mediation(
object,
x,
y,
m,
covariates = NULL,
test = c("boot", "sobel"),
alternative = c("twosided", "less", "greater"),
R = 5000,
level = 0.95,
type = c("perc", "bca"),
order = c("first", "second"),
method = c("regression", "covariance"),
robust = TRUE,
family = "gaussian",
model = c("parallel", "serial"),
contrast = FALSE,
fit_yx = TRUE,
control = NULL,
...
)
# S3 method for fit_mediation
test_mediation(
object,
test = c("boot", "sobel"),
alternative = c("twosided", "less", "greater"),
R = 5000,
level = 0.95,
type = c("perc", "bca"),
order = c("first", "second"),
...
)
robmed(..., test = "boot", method = "regression", robust = TRUE)
An object inheriting from class "test_mediation"
(class
"boot_test_mediation"
if test = "boot"
or
"sobel_test_mediation"
if test = "sobel"
) with the
following components:
a numeric vector containing the bootstrap point estimates of the
effects of the independent variables on the proposed mediator variables
(only "boot_test_mediation"
).
a numeric vector containing the bootstrap point estimates of the
direct effects of the proposed mediator variables on the dependent variable
(only "boot_test_mediation"
).
in case of a serial multiple mediator model, a numeric vector
containing the bootstrap point estimates of the effects of proposed mediator
variables on other mediator variables occurring later in the sequence (only
"boot_test_mediation"
if applicable.
a numeric vector containing the bootstrap point estimates of
the total effects of the independent variables on the dependent variable
(only "boot_test_mediation"
).
a numeric vector containing the bootstrap point estimates of
the direct effects of the independent variables on the dependent variable
(only "boot_test_mediation"
).
a numeric vector containing the bootstrap point estimates of
the indirect effects (only "boot_test_mediation"
).
for back-compatibility with versions <0.10.0, the bootstrap
point estimates of the indirect effects are also included here (only
"boot_test_mediation"
). This component is deprecated and
may be removed as soon as the next version.
a numeric vector of length two or a matrix of two columns
containing the bootstrap confidence intervals for the indirect effects
(only "boot_test_mediation"
).
an object of class "boot"
containing
the bootstrap replicates (only "boot_test_mediation"
). For
regression model fits, bootstrap replicates of the coefficients in the
individual regression models are stored.
numeric; the standard error of the indirect effect according
to Sobel's formula (only "sobel_test_mediation"
).
numeric; the test statistic for Sobel's test (only
"sobel_test_mediation"
).
numeric; the p-value from Sobel's test (only
"sobel_test_mediation"
).
a character string specifying the alternative hypothesis in the test for the indirect effects.
an integer giving the number of bootstrap replicates (only
"boot_test_mediation"
).
numeric; the confidence level of the bootstrap confidence
interval (only "boot_test_mediation"
).
a character string specifying the type of bootstrap
confidence interval (only "boot_test_mediation"
).
an object inheriting from class
"fit_mediation"
containing the estimation results of the
mediation model on the original data.
the first argument will determine the method of the generic
function to be dispatched. For the default method, this should be a data
frame containing the variables. There is also a method for a mediation
model fit as returned by fit_mediation()
.
additional arguments to be passed down. For the bootstrap
tests, those can be used to specify arguments of boot()
,
for example for parallel computing.
an object of class "formula" (or one that can be coerced to
that class): a symbolic description of the model to be fitted. Hypothesized
mediator variables should be wrapped in a call to m()
(see
examples), and any optional control variables should be wrapped in a call to
covariates()
.
for the formula
method, a data frame containing the
variables.
a character string specifying the test to be performed for
the indirect effects. Possible values are "boot"
(the default) for
the bootstrap, or "sobel"
for Sobel's test. Currently, Sobel's test
is not implemented for models with multiple indirect effects.
a character string specifying the alternative hypothesis
in the test for the indirect effects. Possible values are "twosided"
(the default), "less"
or "greater"
.
an integer giving the number of bootstrap replicates. The default is to use 5000 bootstrap replicates.
numeric; the confidence level of the confidence interval in the bootstrap test. The default is to compute a 95% confidence interval.
a character string specifying the type of confidence interval
to be computed in the bootstrap test. Possible values are "perc"
(the default) for the percentile bootstrap, or "bca"
for the
bias-corrected and accelerated (BCa) bootstrap.
a character string specifying the order of approximation of
the standard error in Sobel's test. Possible values are "first"
(the default) for a first-order approximation, and "second"
for a
second-order approximation.
a character string specifying the method of estimation for
the mediation model. Possible values are "regression"
(the default)
to estimate the effects via regressions, or "covariance"
to estimate
the effects via the covariance matrix. Note that the effects are
always estimated via regressions if more than one independent variable or
hypothesized mediator is specified, or if control variables are supplied.
a logical indicating whether to perform a robust test
(defaults to TRUE
). For estimation via regressions
(method = "regression"
), this can also be a character string, with
"MM"
specifying the MM-estimator of regression, and "median"
specifying median regression.
a character string specifying the error distribution to be
used in maximum likelihood estimation of regression models. Possible values
are "gaussian"
for a normal distribution (the default),
skewnormal
for a skew-normal distribution, "student"
for
Student's t distribution, "skewt"
for a skew-t distribution, or
"select"
to select among these four distributions via BIC (see
fit_mediation()
for details). This is only relevant if
method = "regression"
and robust = FALSE
.
a logical indicating whether to compute pairwise contrasts
of the indirect effects (defaults to FALSE
). This can also be a
character string, with "estimates"
for computing the pairwise
differences of the indirect effects (such that it is tested whether two
indirect effects are equal), and "absolute"
for computing the
pairwise differences of the absolute values of the indirect effects
(such that it is tested whether two indirect effects are equal in
magnitude). This is only relevant for models with multiple indirect
effects, which are currently only implemented for estimation via
regressions (method = "regression"
). For models with multiple
independent variables of interest and multiple hypothesized mediators,
contrasts are only computed between indirect effects corresponding to
the same independent variable.
a logical indicating whether to fit the regression model
y ~ x + covariates
to estimate the total effect (the default is
TRUE
). This is only relevant if method = "regression"
and
robust = FALSE
.
a list of tuning parameters for the corresponding robust
method. For the robust MM-estimator of regression
(method = "regression"
, and robust = TRUE
or
robust = "MM"
), a list of tuning parameters for
lmrob()
as generated by
MM_reg_control()
. For median regression
(method = "regression"
, and robust = "median"
), a list of
tuning parameters for rq()
as generated by
median_reg_control()
. For winsorized covariance matrix
estimation (method = "covariance"
and robust = TRUE
), a list
of tuning parameters for cov_Huber()
as generated by
cov_control()
.
a character, integer or logical vector specifying the columns of
object
containing the independent variables of interest.
a character string, an integer or a logical vector specifying the
column of object
containing the dependent variable.
a character, integer or logical vector specifying the columns of
object
containing the hypothesized mediator variables.
optional; a character, integer or logical vector
specifying the columns of object
containing additional covariates to
be used as control variables.
a character string specifying the type of model in case of
multiple mediators. Possible values are "parallel"
(the default) for
the parallel multiple mediator model, or "serial"
for the serial
multiple mediator model. This is only relevant for models with multiple
hypothesized mediators, which are currently only implemented for estimation
via regressions (method = "regression"
).
The following mediation models are implemented. In the regression equations below, the \(i_j\) are intercepts and the \(e_j\) are random error terms.
Simple mediation model: The mediation model in its simplest form is given by the equations $$M = i_1 + aX + e_1,$$ $$Y = i_2 + bM + cX + e_2,$$ $$Y = i_3 + c'X + e_3,$$ where \(Y\) denotes the dependent variable, \(X\) the independent variable, and \(M\) the hypothesized mediator. The main parameter of interest is the product of coefficients \(ab\), called the indirect effect. The coefficients \(c\) and \(c'\) are called the direct and total effect, respectively.
Parallel multiple mediator model: The simple mediation model can be extended with multiple mediators \(M_1, \dots, M_k\) in the following way: $$M_1 = i_1 + a_1 X + e_1,$$ $$\vdots$$ $$M_k = i_k + a_k X + e_k,$$ $$Y = i_{k+1} + b_1 M_1 + \dots + b_k M_k + c X + e_{k+1},$$ $$Y = i_{k+2} + c' X + e_{k+2}.$$ The main parameters of interest are the individual indirect effects \(a_1 b_1, \dots, a_k b_k\).
Serial multiple mediator model: It differs from the parallel multiple mediator model in that it allows the hypothesized mediators \(M_1, \dots, M_k\) to influence each other in a sequential manner. It is given by the equations $$M_1 = i_1 + a_1 X + e_1,$$ $$M_2 = i_1 + d_{21} M_1 + a_2 X + e_2,$$ $$\vdots$$ $$M_k = i_k + d_{k1} M_1 + \dots + d_{k,k-1} M_{k-1} + a_k X + e_k,$$ $$Y = i_{k+1} + b_1 M_1 + \dots + b_k M_k + c X + e_{k+1},$$ $$Y = i_{k+2} + c' X + e_{k+2}.$$ The serial multiple mediator model quickly grows in complexity with increasing number of mediators due to the combinatorial increase in indirect paths through the mediators. It is therefore only implemented for two and three mediators to maintain a focus on easily interpretable models. For two serial mediators, the three indirect effects \(a_1 b_1\), \(a_2 b_2\), and \(a_1 d_{21} b_2\) are the main parameters of interest. For three serial mediators, there are already seven indirect effects: \(a_1 b_1\), \(a_2 b_2\), \(a_3 b_3\), \(a_1 d_{21} b_2\), \(a_1 d_{31} b_3\), \(a_2 d_{32} b_3\), and \(a_1 d_{21} d_{32} b_3\).
Multiple independent variables to be mediated: The simple mediation model can also be extended by allowing multiple independent variables \(X_1, \dots, X_l\) instead of multiple mediators. It is defined by the equations $$M = i_1 + a_1 X_1 + \dots + a_l X_l + e_1,$$ $$Y = i_2 + b M + c_1 X_1 + \dots + c_l X_l + e_2,$$ $$Y = i_3 + c_1' X_1 + \dots + c_l' X_l + e_3.$$ The indirect effects \(a_1 b, \dots, a_l b\) are the main parameters of interest. Note that an important special case of this model occurs when a categorical independent variable is represented by a group of dummy variables.
Control variables: To isolate the effects of the independent variables of interest from other factors, control variables can be added to all regression equations of a mediation model. Note that that there is no intrinsic difference between independent variables of interest and control variables in terms of the model or its estimation. The difference is purely conceptual in nature: for the control variables, the estimates of the direct and indirect paths are not of particular interest to the researcher. Control variables can therefore be specified separately from the independent variables of interest. Only for the latter, results for the indirect effects are included in the output.
More complex models: Some of the models described above can be combined, for instance parallel and serial multiple mediator models support multiple independent variables of interest and control variables.
Andreas Alfons
With method = "regression"
, and robust = TRUE
or
robust = "MM"
, the tests are based on robust regressions with the
MM-estimator from lmrob()
. The bootstrap test is
thereby performed via the fast-and-robust bootstrap. This is the default
behavior.
Note that the MM-estimator of regression implemented in
lmrob()
can be seen as weighted least squares
estimator, where the weights are dependent on how much an observation is
deviating from the rest. The trick for the fast-and-robust bootstrap is
that on each bootstrap sample, first a weighted least squares estimator
is computed (using those robustness weights from the original sample)
followed by a linear correction of the coefficients. The purpose of this
correction is to account for the additional uncertainty of obtaining the
robustness weights.
With method = "regression"
and robust = "median"
, the tests
are based on median regressions with rq()
. Note
that the bootstrap test is performed via the standard bootstrap, as the
fast-and-robust bootstrap is not applicable. Unlike the robust regressions
described above, median regressions are not robust against outliers in
the explanatory variables, and the standard bootstrap can suffer from
oversampling of outliers in the bootstrap samples.
With method = "covariance"
and robust = TRUE
, the
tests are based on a Huber M-estimator of location and scatter. For the
bootstrap test, the M-estimates are used to first clean the data via a
transformation. Then the standard bootstrap is performed with the cleaned
data. Note that this covariance-based approach is less robust than the
approach based on robust regressions described above. Furthermore, the
bootstrap does not account for the variability from cleaning the data.
robmed()
is a wrapper function for performing robust mediation
analysis via regressions and the fast-and-robust bootstrap.
Alfons, A., Ates, N.Y. and Groenen, P.J.F. (2022a) A Robust Bootstrap Test for Mediation Analysis. Organizational Research Methods, 25(3), 591--617. doi:10.1177/1094428121999096.
Alfons, A., Ates, N.Y. and Groenen, P.J.F. (2022b) Robust Mediation Analysis: The R Package robmed. Journal of Statistical Software, 103(13), 1--45. doi:10.18637/jss.v103.i13.
Azzalini, A. and Arellano-Valle, R. B. (2013) Maximum Penalized Likelihood Estimation for Skew-Normal and Skew-t Distributions. Journal of Statistical Planning and Inference, 143(2), 419--433. doi:10.1016/j.jspi.2012.06.022.
Preacher, K.J. and Hayes, A.F. (2004) SPSS and SAS Procedures for Estimating Indirect Effects in Simple Mediation Models. Behavior Research Methods, Instruments, & Computers, 36(4), 717--731. doi:10.3758/bf03206553.
Preacher, K.J. and Hayes, A.F. (2008) Asymptotic and Resampling Strategies for Assessing and Comparing Indirect Effects in Multiple Mediator Models. Behavior Research Methods, 40(3), 879--891. doi:10.3758/brm.40.3.879.
Salibian-Barrera, M. and Van Aelst, S. (2008) Robust Model Selection Using Fast and Robust Bootstrap. Computational Statistics & Data Analysis, 52(12), 5121--5135. doi:10.1016/j.csda.2008.05.007.
Salibian-Barrera, M. and Zamar, R. (2002) Bootstrapping Robust Estimates of Regression. The Annals of Statistics, 30(2), 556--582. doi:10.1214/aos/1021379865.
Sobel, M.E. (1982) Asymptotic Confidence Intervals for Indirect Effects in Structural Equation Models. Sociological Methodology, 13, 290--312. doi:10.2307/270723.
Yuan, Y. and MacKinnon, D.P. (2014) Robust Mediation Analysis Based on Median Regression. Psychological Methods, 19(1), 1--20. doi:10.1037/a0033820.
Zu, J. and Yuan, K.-H. (2010) Local Influence and Robust Procedures for Mediation Analysis. Multivariate Behavioral Research, 45(1), 1--44. doi:10.1080/00273170903504695.
data("BSG2014")
## seed to be used for the random number generator
seed <- 20150601
## simple mediation
set.seed(seed)
boot_simple <- test_mediation(TeamCommitment ~
m(TaskConflict) +
ValueDiversity,
data = BSG2014)
summary(boot_simple)
# depending on the seed of the random number generator, one
# may get a p value slightly below or above the arbitrary
# 5% threshold
p_value(boot_simple, parm = "indirect")
# The results in Alfons et al. (2022a) were obtained with an
# older version of the random number generator and with BCa
# bootstrap intervals (which are no longer recommended).
# To reproduce those results, uncomment the lines below.
# RNGversion("3.5.3")
# set.seed(seed)
# boot_simple <- test_mediation(TeamCommitment ~
# m(TaskConflict) +
# ValueDiversity,
# data = BSG2014,
# type = "bca")
# summary(boot_simple)
# \donttest{
## serial multiple mediators
set.seed(seed)
boot_serial <- test_mediation(TeamScore ~
serial_m(TaskConflict,
TeamCommitment) +
ValueDiversity,
data = BSG2014,
level = 0.9)
summary(boot_serial)
## parallel multiple mediators and control variables
set.seed(seed)
boot_parallel <- test_mediation(TeamPerformance ~
parallel_m(ProceduralJustice,
InteractionalJustice) +
SharedLeadership +
covariates(AgeDiversity,
GenderDiversity),
data = BSG2014)
summary(boot_parallel)
# }
Run the code above in your browser using DataLab