An all-in-one function that receives a model specification, generates datasets, fits a model, does the target test, and returns the test results.
power4test(
object = NULL,
nrep = NULL,
ptable = NULL,
model = NULL,
pop_es = NULL,
standardized = TRUE,
n = NULL,
number_of_indicators = NULL,
reliability = NULL,
x_fun = list(),
e_fun = list(),
process_data = NULL,
fit_model_args = list(),
R = NULL,
ci_type = "mc",
gen_mc_args = list(),
gen_boot_args = list(),
test_fun = NULL,
test_args = list(),
map_names = c(fit = "fit"),
results_fun = NULL,
results_args = list(),
test_name = NULL,
test_note = NULL,
do_the_test = TRUE,
sim_all = NULL,
iseed = NULL,
parallel = FALSE,
progress = TRUE,
ncores = max(1, parallel::detectCores(logical = FALSE) - 1),
es1 = c(n = 0, nil = 0, s = 0.1, m = 0.3, l = 0.5, si = 0.141, mi = 0.361, li = 0.51),
es2 = c(n = 0, nil = 0, s = 0.05, m = 0.1, l = 0.15),
es_ind = c("si", "mi", "li"),
n_std = 1e+05,
std_force_monte_carlo = FALSE
)# S3 method for power4test
print(
x,
what = c("data", "test"),
digits = 3,
digits_descriptive = 2,
data_long = FALSE,
test_long = FALSE,
fit_to_all_args = list(),
...
)
An object of the class power4test,
which is a list with two elements:
sim_all: The output of sim_out().
test_all: A named list of the
output of do_test(). The names
are the values of test_name.
This list can have more than one
test because a call to
power4test() can add new tests
to a power4test object.
The print method of power4test
returns x invisibly. Called for
its side effect.
Optional. If set to a
power4test object, it will be
updated using the value(s) in n,
pop_es, and/or nrep if they changed,
or a new test will be conducted and
added to objet. See the help page
for details. Default is NULL.
The number of replications
to generate the simulated datasets.
Default is NULL. Must be set when
called to create a power4test
object.
The output of
ptable_pop(), which is a
ptable_pop object, representing the
population model. If NULL, the
default, ptable_pop() will be
called to generate the ptable_pop
object using model and pop_es.
The lavaan model
syntax of the population model,
to be used by ptable_pop().
See 'Details' of
on how to specify the model.
Ignored if ptable is
specified.
The character vector or
multiline string to
specify population effect sizes
(population values of parameters). See
the help page on how to specify this
argument.
Ignored if ptable is
specified.
Logical. If
TRUE, the default, variances and
error variances are scaled to ensure
the population variances of the
endogenous variables are close to
one, and hence the effect sizes
(population values) are
standardized effect sizes if the
variances of the continuous exogenous
variables are also equal to one.
The sample size for each dataset. Default is 100.
A named
vector to specify the number of
indicators for each factors. See
the help page on how to set this
argument. Default is NULL and all
variables in the model syntax are
observed variables.
See the help page on how
to use this argument.
A named vector
(for a single-group model) or a
named list of named vectors
(for a multigroup model)
to set the reliability coefficient
of each set of indicators. Default
is NULL.
See the help page on how
to use this argument.
The function(s) used to
generate the exogenous variables or
error terms. If
not supplied, or set to list(), the
default, the variables are generated
from a multivariate normal
distribution. See the help page on how
to use this argument.
The function(s) used to
generate the error terms of indicators,
if any. If
not supplied, or set to list(), the
default, the error terms of indicators
are generated
from a multivariate normal
distribution. Specify in the same
way as x_fun. Refer to the help
page on x_fun on how to use this
argument.
If not NULL, it
must be a named list with these
elements: fun (required), the function
to further processing the simulated
data, such as generating missing data using
functions such as mice::ampute(); args (optional), a
named list of arguments to be passed
to fun, except the one for the
source data; sim_data_name (required) the
name of the argument to receive the
simulated data (e.g., data for
mice::ampute()); processed_data_name
(optional), the name of the data frame
after being processed by fun,
such as the data frame
with missing data in the output of
fun (e.g., "amp" for mice::ampute()),
if omitted, the output of fun should
be the data frame with missing data.
A list of the
arguments to be passed to fit_model()
when fitting the
model.
Should be a named list
with names being the names of the
arguments.
The number of replications
to generate the Monte Carlo or
bootstrapping estimates
for each fit output. No Monte Carlo
nor bootstrapping
estimates will be generated if R
is set to NULL.
The type of
simulation-based confidence
intervals to use. Can be either
"mc" for Monte Carlo method
(the default) or "boot" for
nonparametric bootstrapping method.
Relevant for test functions that
make use of estimates generate by
gen_boot() or gen_mc(), such as
test_indirect_effect().
A list of
arguments to be passed to
manymome::do_mc() when generating
the Monte Carlo estimates.
Should be a named argument
with names being the names of the
arguments. Used only if
ci_type is "mc".
A list of
arguments to be passed to
manymome::do_boot() when generating
the bootstrap estimates.
Should be a named argument
with names being the names of the
arguments. Used only if
ci_type is `"boot".
A function to do the
test. See 'Details' for the requirement
of this function. There are some
built-in test functions in power4mome,
described in 'Details'.
A list of arguments
to be passed to the test_fun
function. Default is list().
A named character
vector specifying how the content of
the element extra in
each replication of sim_all map
to the argument of test_fun.
Default is c(fit = "fit"),
indicating that the element fit
in the element extra is set to
the argument fit of test_fun.
That is, for the first replication,
fit = sim_out[[1]]$extra$fit when
calling test_fun.
The function to be
used to extract the test results.
See Details for the requirements
of this function. Default is NULL,
assuming that the output of
test_fun can be used directly.
A list of
arguments to be passed to the
results_fun function. Default is
list().
String. The name
of the test. Default is NULL,
and the name will be created from
test_fun. Note that if sim_out
is a power4test object and already
has a test of this name stored, it
will be replaced by the new results.
String. An optional
note for the test, stored in the
attribute test_note of the output
of do_test(). Default is NULL.
If TRUE,
do_test() will be called to do the
test specified by test_fun on the
fit output of each dataset.
If set to either a
sim_out object (the output of
sim_out() or a power4test object
(the output of power4test()), the
stored datasets and fit outputs will
be used for doing the test. Setting
object to the output of power4test()
is now the preferred method, but this
argument is kept for backward
compatibility.
The seed for the random
number generator. Default is NULL
and the seed is not changed. This
seed will be set only once, when
calling sim_data().
If TRUE, parallel
processing will be used when calling
other functions, if appropriate.
Default is FALSE.
If TRUE, the progress
of each step will be displayed.
Default is TRUE.
The number of CPU cores to use if parallel processing is used.
Set the
values for each label of the effect
size (population value) for correlations and regression
paths.
Used only if pop_es is a named
vector or a multiline string.
See the help page on how to specify
this argument.
Set the
values for each label of the effect
size (population value) for product term.
Used only if pop_es is a named
vector or a multiline string.
See the help page on how to specify
this argument.
The names of labels denoting the effect size of an indirect effect. They will be used to determine the population values of the component paths along an indirect path.
The sample size used to
determine the error variances by
simulation when std_force_monte_carlo
is TRUE.
Logical.
If FALSE, the default,
standardization is done analytically
if the model has no product terms,
and by simulation if the model has
product terms. That is, error variances
required to ensure implied variances equal
to one are determined by simulation.
If TRUE, simulation
will be used whether the model has
product terms or not. Always fall
back to simulation if
analytical standardization failed.
The object to be printed.
A string vector of
what to print, "data" for
simulated data and "test" for
stored test(s). Default is
c("data", "test").
The numbers of digits displayed after the decimal.
The number of digits displayed after the decimal for the descriptive statistics table.
If TRUE, detailed
results will be printed when printing
the simulated data.
If TRUE, detailed
results will be printed when printing
test(s).
A named list
of arguments to be passed to
lavaan::sem() when the model is
fitted to a sample combined from
Optional arguments to be passed to other print methods
This is the workflow:
If object is an output
of the output
of a previous call to
power4test() with do_the_test
set to FALSE and so has only the
model and the simulated data,
the following steps
will be skipped and go directly to
doing the test.
Call sim_data() to determine
the population model and
generate the datasets, using
arguments such as model and
pop_es.
Call fit_model() to fit a
model to each of the datasets,
which is the population model
by default.
If R is not NULL and
ci_type = "mc", call
gen_mc() to generate Monte
Carlo estimates using
manymome::do_mc(). The estimates
can be used by supported functions
such as test_indirect_effect().
If R is not NULL and
ci_type = "boot", call
gen_boot() to generate
bootstrap estimates using
manymome::do_boot(). The estimates
can be used by supported functions
such as test_indirect_effect().
Merge the results into a
sim_out object by calling
sim_out().
If do_the_test is FALSE,
skip the remaining steps and
return a power4test object,
which contains only the data
generated and optionally the
Monte Carlo or bootstrap
estimates.
If do_the_test is TRUE, do
the test.
do_test() will be called to do
the test in the fit output of
each dataset.
Return a power4test object which
include the output of sim_out
and, if do_the_test is TRUE,
the output of do_test().
This function is to be used when users are interested only in the power of one or several tests on a particular aspect of the model, such as a parameter, given a specific effect sizes and sample sizes.
Detailed description on major arguments can be found in sections below.
NOTE: The technical internal workflow of
of power4test() can be found in
this page: https://sfcheung.github.io/power4mome/articles/power4test_workflow.html.
The function power4test() can also be used to
update a condition when only some
selected aspects is to be changed.
For example,
instead of calling this function with
all the arguments set just to change
the sample size, it can be called
by supplying an existing
power4test object and set only
n to a new sample size. The data
and the tests will be updated
automatically. See the examples for
an illustration.
The function power4test() can also be used to
add a test to the output from a
previous call to power4test().
For example, after simulating the
datasets and doing one test,
the output can be set to object
of power4test(), and set only
test_fun and, optionally,
test_fun_args to do one more test
on the generated datasets. The output
will be the original
object with the results of the new
test added. See the examples for
an illustration.
For power analysis, usually, the
population model (model) is to be
fitted, and there is no need to
set fit_model_args.
If power analysis is to be conducted
for fitting a model that is not the
population model, of if non-default
settings are desired when fitting
a model, then the argument fit_model_args
needed to be set to customize the
call to fit_model().
For example,
users may want to examine the power
of a test when a misspecified model
is fitted, or the power of a test
when MLR is used as the estimator
when calling lavaan::sem().
See the help page of fit_model()
for some examples.
For a single-group model, model
should be a lavaan model syntax
string of the form of the model.
The population values of the model
parameters are to be determined by
pop_es.
If the model has latent factors,
the syntax in model should specify
only the structural model for the
latent factors. There is no
need to specify the measurement
part. Other functions will generate
the measurement part on top of this
model.
For example, this is a simple mediation model:
"m ~ x
y ~ m + x"Whether m, x, and y denote
observed variables or latent factors
are determined by other functions,
such as power4test().
Because the model is the population
model, equality constraints are
irrelevant and the model syntax
specifies only the form of the
model. Therefore, model is
specified as in the case of single
group models.
The argument pop_es is for specifying
the population values of model
parameters. This section describes
how to do this using named vectors.
If pop_es is specified by a named
vector, it must follow the convention
below.
The names of the vectors are
lavaan names for the selected
parameters. For example, m ~ x
denotes the path from x to m.
Alternatively, the names can be
either ".beta." or ".cov.".
Use ".beta." to set the default
values for all regression coefficients.
Use ".cov." to set the default
values for all correlations of
exogenous variables (e.g., predictors).
The names can also be of this form:
".ind.(<path>)", whether <path>
denote path in the model. For
example, ".ind.(x->m->y)" denotes
the path from x through m to
y. Alternatively, the lavaan
symbol ~ can also be used:
".ind.(y~m~x)". This form is used
to set the indirect effect (standardized,
by default) along this path. The
value for this name will override
other settings.
If using lavaan names, can
specify more than one parameter
using +. For example, y ~ m + x
denotes the two paths from m and
x to y.
The value of each element can be
the label for the effect size: n
for nil, s for small, m for
medium, and l for large. The
value for each label is determined
by es1 and es2. See the section
on specifying these two arguments.
The value of pop_es can also be
set to a value, but it must be
quoted as a string, such as "y ~ x" = ".31".
This is an example:
c(".beta." = "s",
"m1 ~ x" = "-m",
"m2 ~ m1" = "l",
"y ~ x:w" = "s")In this example,
All regression coefficients are
set to small (s) by default, unless
specified otherwise.
The path from x to m1 is
set to medium and negative (-m).
The path from m1 to m2 is set
to large (l).
The coefficient of the product
term x:w when predicting y is
set to small (s).
When setting an indirect effect to
a symbol (default: "si", "mi",
"li", with "i" added to differentiate
them from the labels for a direct path),
the corresponding value is used to
determine the population values of
all component paths along the pathway.
All the values are assumed to be equal.
Therefore, ".ind.(x->m->y)" = ".20"
is equivalent to setting m ~ x
and y ~ m to the square root of
.20, such that the corresponding
indirect effect is equal to the
designated value.
This behavior, though restricted, is for quick manipulation of the indirect effect. If different values along a pathway, set the value for each path directly.
Only nonnegative value is supported.
Therefore, ".ind.(x->m->y)" = "-si"
and ".ind.(x->m->y)" = "-.20" will
throw an error.
The argument pop_es also supports multigroup
models.
For pop_es, instead of
named vectors, named list of
named vectors should be used.
The names are the parameters, or
keywords such as .beta. and
.cov., like specifying the
population values for a single
group model.
The elements are character vectors. If it has only one element (e.g., a single string), then it is the the population value for all groups. If it has more than one element (e.g., a vector of three strings), then they are the population values of the groups. For a model of k groups, each vector must have either k elements or one element.
This is an example:
list("m ~ x" = "m",
"y ~ m" = c("s", "m", "l"))In this model, the population value
of the path m ~ x is medium (m) for
all groups, while the population
values for the path y ~ m are
small (s), medium (m), and large (l),
respectively.
When setting the argument pop_es,
instead of using a named vector
or named list for
pop_es, the population values of
model parameters can also be
specified using a multiline string,
as illustrated below, to be parsed
by pop_es_yaml().
This is an example of the multiline string for a single-group model:
y ~ m: l
m ~ x: m
y ~ x: nilThe string must follow this format:
Each line starts with tag:.
tag can be the name of a
parameter, in lavaan model
syntax format.
For example, m ~ x
denotes the path from x to m.
A tag in lavaan model syntax can
specify more than one parameter
using +.
For example, y ~ m + x
denotes the two paths from m and
x to y.
Alternatively, the tag can be
either .beta. or .cov..
Use .beta. to set the default
values for all regression coefficients.
Use .cov. to set the default
values for all correlations of
exogenous variables (e.g., predictors).
After each tag is the value of the population value:
-nil for nil (zero),
s for small,
m for medium, and
l for large.
si, mi, and li for
small, medium, and large a
standardized indirect effect,
respectively.
Note: n cannot be used in this mode.
The
value for each label is determined
by es1 and es2 as described
in ptable_pop().
The value can also be
set to a numeric value, such as
.30 or -.30.
This is another example:
.beta.: s
y ~ m: lIn this example, all regression
coefficients are small, while
the path from m to y is large.
This is an example of the string for a multigroup model:
y ~ m: l
m ~ x:
- nil
- s
y ~ x: nilThe format is similar to that for
a single-group model. If a parameter
has the same value for all groups,
then the line can be specified
as in the case of a single-group
model: tag: value.
If a parameter has different values across groups, then it must be in this format:
A line starts with the tag, followed
by two or more lines. Each line
starts with a hyphen - and the
value for a group.
For example:
m ~ x:
- nil
- sThis denotes that the model has
two groups. The values of the path
from x to m for the two
groups are 0 (nil) and
small (s), respectively.
Another equivalent way to specify
the values are using [], on
the same line of a tag.
For example:
m ~ x: [nil, s]The number of groups is inferred from the number of values for a parameter. Therefore, if a tag has more than one value, each tag must has the same number of value, or only one value.
The tag .beta. and .cov. can
also be used for multigroup models.
Note that using named vectors or named lists is more reliable. However, using a multiline string is more user-friendly. If this method failed, please use named vectors or named list instead.
The multiline string is parsed by yaml::read_yaml().
Therefore, the format requirement
is actually that of YAML. Users
knowledgeable of YAML can use other
equivalent way to specify the string.
The vector es1 is for correlations,
regression coefficients, and
indirect effect, and the
vector es2 is for for standardized
moderation effect, the coefficients
of a product term. These labels
are to be used in interpreting
the specification in pop_es.
The arguments number_of_indicators
and reliability are used to
specify the number of indicators
(e.g., items) for each factor,
and the population reliability
coefficient of each factor,
if the variables in the model
syntax are latent variables.
If a variable in the model is to be
replaced by indicators in the generated
data, set
number_of_indicators to a named
numeric vector. The names are the
variables of variables with
indicators, as appeared in the
model syntax. The value of each
name is the number of indicators.
The
argument reliability should then be
set a named numeric vector (or list,
see the section on multigroup models)
to specify the population reliability
coefficient ("omega") of each set of
indicators. The population standardized factor
loadings are then computed to ensure
that the population reliability
coefficient is of the target value.
These are examples for a single group model:
number of indicator = c(m = 3, x = 4, y = 5)The numbers of indicators for m,
x, and y are 3, 4, and 5,
respectively.
reliability = c(m = .90, x = .80, y = .70)The population reliability
coefficients of m, x, and y are
.90, .80, and .70, respectively.
Multigroup models are supported.
The number of groups is inferred
from pop_es (see the help page
of ptable_pop()), or directly
from ptable.
For a multigroup model, the number of indicators for each variable must be the same across groups.
However, the population reliability
coefficients can be different
across groups. For a multigroup model
of k groups,
with one or more population reliability
coefficients differ across groups,
the argument reliability should be
set to a named list. The names are
the variables to which the population
reliability coefficients are to be
set. The element for each name is
either a single value for the common
reliability coefficient, or a
numeric vector of the reliability
coefficient of each group.
This is an example of reliability
for a model with 2 groups:
reliability = list(x = .80, m = c(.70, .80))The reliability coefficients of x are
.80 in all groups, while the
reliability coefficients of m are
.70 in one group and .80 in another.
If all variables in the model has
the same number of indicators,
number_of_indicators can be set
to one single value.
Similarly, if all sets of indicators
have the same population reliability
in all groups, reliability can also
be set to one single value.
By default, variables and error
terms are generated
from a multivariate normal distribution.
If desired, users can supply the
function used to generate an exogenous
variable and error term by setting x_fun to
a named list.
The names of the list are the variables for which a user function will be used to generate the data.
Each element of the list must also be a list. The first element of this list, can be unnamed, is the function to be used. If other arguments need to be supplied, they should be included as named elements of this list.
For example:
x_fun = list(x = list(power4mome::rexp_rs),
w = list(power4mome::rbinary_rs,
p1 = .70)))The variables x and w will be
generated by user-supplied functions.
For x, the function is
power4mome::rexp_rs. No additional
argument when calling this function.
For w, the function is
power4mome::rbinary_rx. The argument
p1 = .70 will be passed to this
function when generating the values
of w.
If a variable is an endogenous
variable (e.g., being predicted by
another variable in a model), then
x_fun is used to generate its
error term. Its implied population
distribution may still be different
from that generate by x_fun because
the distribution also depends on the
distribution of other variables
predicting it.
These are requirements for the user-functions:
They must return a numeric vector.
They mush has an argument n for
the number of values.
The population mean and standard deviation of the generated values must be 0 and 1, respectively.
The package power4mome has
helper functions for generating
values from some common nonnormal
distributions and then scaling them
to have population mean and standard
deviation equal to 0 and 1 (by default), respectively.
These are some of them:
rbinary_rs().
rexp_rs().
rbeta_rs().
rlnorm_rs().
rpgnorm_rs().
To use x_fun, the variables must
have zero covariances with other
variables in the population. It is
possible to generate nonnormal
multivariate data but we believe this
is rarely needed when estimating
power before having the data.
The function set to test_fun,
the test function, usually
should work
on the output of lavaan::sem(),
lmhelprs::many_lm(), or
stats::lm(), but can also be a
function that works on the output
of the function set to fit_function
when calling fit_model() or
power4test() (see fit_model_args).
The function has two default requirements.
First, it has an argument fit, to
be set to the output of
lavaan::sem() or another output
stored in the element extra$fit of
a replication in the sim_all
object. (This requirement can be
relaxed; see the section on map_names.)
That is, the function definition
should be of this from: FUN(fit, ...). This is the form of all
test_* functions in power4mome.
If other arguments are to be passed
to the test function, they can be set
to test_args as a named list.
Second, the test function must returns an output that (a) can be processed by the results function (see below), or (b) is of the required format for the output of a results function (see the next section). If it already returns an output of the required format, then there is no need to set the results function.
The test results will be extracted
from the output of test_fun by the
function set to results_fun,
the results function. If
the test_fun already returns an
output of the expected format
(see below), then set results_fun
to NULL, the default. The output
of test_fun will be used for
estimating power.
The function set to results_fun
must accept the output of test_fun,
as the first argument, and return a
named list (which can be a data frame)
or a named vector with some
of the following
elements:
est: Optional. The estimate of a
parameter, if applicable.
se: Optional. The standard error
of the estimate, if applicable.
cilo: Optional. The lower limit of the
confidence interval, if applicable.
cihi: Optional. The upper limit of the
confidence interval, if applicable.
sig: Required. If 1, the test is
significant. If 0, the test is not
significant. If the test cannot be
done for any reason, it should be
NA.
The results can then be used to estimate the power or Type I error of the test.
For example, if
the null hypothesis is false, then
the proportion of significant, that
is, the mean of the values of sig
across replications, is the power.
The package power4mome has some ready-to-use
test functions:
test_indirect_effect()
test_cond_indirect()
test_cond_indirect_effects()
test_moderation()
test_index_of_mome()
test_parameters()
Please refer to their help pages for examples.
This argument is for developers using
a test function that has a different
name for the argument of the fit
object ("fit", the default).
If test_fun is set to a function
that works on an output of, say,
lavaan::sem() but the argument name
for the output is not fit, the
mapping can be changed by
map_names.
For example,
lavaan::parameterEstimates()
receives an output of lavaan::sem()
and reports the test results of model
parameters. However, the argument
name for the lavaan output is
object. To instruct do_test() to
do the test correctly when setting
test_fun to
lavaan::parameterEstimates, add
map_names = c(object = "fit"). The
element fit in a replication will
then set to the argument object of
lavaan::parameterEstimates().
The function power4test() is an
all-in-one function for
estimating the power of a test for
a model, given the sample size
and effect sizes (population values
of model parameters).
# Specify the model
model_simple_med <-
"
m ~ x
y ~ m + x
"
# Specify the population values
model_simple_med_es <-
"
m ~ x: m
y ~ m: l
y ~ x: n
"
# Set nrep to a large number in real analysis, such as 400
# Set `parallel` to TRUE for faster, usually much faster, analysis
# Set `progress` to TRUE to display the progress of the analysis
out <- power4test(nrep = 10,
model = model_simple_med,
pop_es = model_simple_med_es,
n = 100,
test_fun = test_parameters,
test_args = list(pars = "m~x"),
iseed = 1234,
parallel = FALSE,
progress = TRUE)
print(out,
test_long = TRUE)
# Change the sample size
out1 <- power4test(out,
n = 200,
iseed = 1234,
parallel = FALSE,
progress = TRUE)
print(out1,
test_long = TRUE)
# Add one more test
out2 <- power4test(out,
test_fun = test_parameters,
test_args = list(pars = "y~x"),
parallel = FALSE,
progress = TRUE)
print(out2,
test_long = TRUE)
Run the code above in your browser using DataLab