This function performs the simulation procedure in order to get
the p values that will eventually serve for power calculations (via
pow). The observation values ("sample") to be tested are
simulated via the given fun_obs function, and the significance
testing is performed via the given fun_test function. The numbers of
observations per look (for a sequential design) are specified in
n_obs.
sim(
fun_obs,
n_obs,
fun_test,
n_iter = 45000,
adjust_n = 1,
seed = 8,
pair = NULL,
ignore_suffix = FALSE,
prog_bar = FALSE,
hush = FALSE
)Returns a data.frame (with class "possa_sim_df")
that includes the columns .iter (the iterations of the simulation
procedure numbered from 1 to n_iter), .look (the
interim "looks" numbered from 1 to the maximum number of looks,
including the final one), and the information returned by the
fun_test function for H0 and H1 outcomes (mainly p values; but also
other, optional information, if any) and the corresponding observation
numbers, as well as the total observation number per each look under a
dedicated .n_total column. When this data frame is printed to the
console (via POSSA's print()
method), the head (first few lines) of the data is shown, as well as, in case of any varying factors included, summary information per factor combination.
A function that creates the observations (i.e.,
the "sample"; all values for the dependent variable(s)). The respective
maximum observation number(s), given in n_obs, will be passed to the
fun_obs. For this, the returned value must be a named list, where the
names correspond exactly to the arguments in fun_test. In case of
sequential testing, the observations returned by fun_obs will be
reduced to the specified (smaller) number(s) of observations for each given
interim "look" (as a simulation for what would happen if collection was
stopped at that given look), to be used in fun_test. Optionally, the
fun_obs can be passed additional arguments (via a
list); see Details.
A numeric vector or a named list of numeric vectors. Specifies
the numbers of observations (i.e., samples sizes) that are to be generated
by fun_obs and then tested in fun_test. If a single vector is
given, this will be used for all observation number arguments in the
fun_obs and for the sample size adjustments for the arguments in the
fun_test functions. Otherwise, if a named list of numeric vectors is
given, the names must correspond exactly to the argument names in
fun_obs and fun_test, so that the respective numeric vectors
are used for each given sample variable. For convenience, in case of a
"_h" suffix, the variable will be divided into names with
"_h0" and "_h1" suffixes for fun_test (but not for
fun_obs); see Details.
The function for significance testing. The list of samples
returned by fun_obs (with observation numbers specified in
n_obs) will be passed into this fun_test function as
arguments, to be used in the given statistical significance tests in this
function. To correctly calculate the sample sizes in
POSSA::pow, the argument names for the sample that
varies depending on whether the null (H0) and alternative (H1) hypothesis is
true should be indicated with "_h0" and "_h1" suffixes,
respectively, with a common root (so, e.g., "var_x_h0" and
"var_x_h1"). Then, in the resulting data.frame, their
sample size (which must always be identical) will be automatically merged
into a single column with a trimmed "_h" suffix (e.g.,
"var_x_h"). (Otherwise, the sample sizes of both H0 and H1 would be
calculated toward the total expected sample in either case, which is of
course incorrect. There are internal checks to prevent this, but the
intended total sample size can also be double-checked in the returned
data.frame's .n_total column.) Within-subject
observations, i.e., multiple observations per group, should be specified
with "GRP" prefix for a single group (e.g., simply "GRP", or
"GRP_mytest") and, for multiple groups, "grp_" prefix with a
following group name (e.g., "grp_1" or "grp_alpha"); the
numbers of multiple observations in each group can then be specified in
fun_obs via their group name (since the respective numbers of
observations should always be the same anyway); see Examples. To be
recognized by the POSSA::pow function, the
fun_test must return a named vector including a pair (or pairs) of p
values for H0 and H1 outcomes, where each p value's name must be specified
with a "p_" prefix and a "_h0" suffix for H0 outcome or a
"_h1" suffix for H1 outcome (e.g., p_h0, p_h1;
p_ttest_h0, p_ttest_h1). The simulated outcomes (per
iteration) for each of these p values will be separately stored in a
dedicated column of the data.frame returned by the sim
function. Optionally, the fun_test can return other miscellaneous
outcomes too, such as effect sizes or confidence interval limits; these will
then be stored in dedicated columns in the resulting
data.frame.
Number of iterations (default: 45000).
Adjust total number of observations via simple multiplication.
Might be useful in some specific cases, e.g. if for some reason multiple p
values are derived from the same sample without specifying grouping
(GRP or grp_ in fun_test), which would then lead to
incorrect (too many, multiplied) totals; for example, in case of four
observations obtained from the same sample, the value 1/4 could be
given. (The default value is 1.)
Number for set.seed; 8 by default. Set to
NULL for random seed.
Logical or NULL. By default NULL, the algorithm
assumes paired samples included among the observations in case of any
grouping via the fun_test parameters ("GRP"/"grp"), and
no paired samples otherwise. In case of paired samples included, within each
look, the same vector indexes to remove elements from the given
observations. In general, this should not substantially affect the outcomes
of independent samples (assuming that their order is truly independent), but
this depends on how the random samples are generated in the fun_obs
function. To be safe and avoid any potential bias, it is best to avoid this
paired sampling mechanism when no paired samples are included. To override
the default, set to TRUE for paired samples scenario (paired
sampling), or to FALSE for no paired samples scenario (random
subsampling of each sample). (Might be useful for testing or some very
specific procedures, e.g. where grouping is not indicated despite paired
samples.)
Set to NULL to give warnings instead of errors for
internally detected consistency problems with the _h0/_h1
suffixes in the fun_test function arguments. Set to TRUE to
completely ignore these (neither error nor warning). (Might be useful for
testing or some very specific procedures.)
Logical, FALSE by default. If TRUE, shows
progress bar.
Logical, FALSE by default. If TRUE, prevents
printing any details to console.
To specify a variable that differs depending on whether the null hypothesis
("H0") or the alternative hypothesis ("H1") is true, a pair of samples are
needed for fun_test, for which the argument names should have an
identical root and "_h0" and "_h1" endings, such as
"var_x_h0" (for sample in case of H0) and "var_x_h1" (for sample
in case of H1). Then, since the observation number for this pair will always
be the same, as a convenience, parameters with "_h0" and "_h1"
endings specifically can be specified together in n_obs with the last
"0"/"1" character dropped, hence ending with "_h". So, for example,
"var_x_h = c(30, 60, 90)" will be automatically adjusted to specify the
observation numbers for both "var_x_h0" and "var_x_h1". In that
case, fun_obs must have a single argument "var_x_h", while
fun_test must have both full names as arguments ("var_x_h0" and
"var_x_h1").
Optionally, fun_obs can be provided in list format for
the convenience of exploring varying factors (e.g., different effect sizes,
correlations) at once, without writing a dedicated fun_obs function for
each combination, and each time separately running the simulation and the
power calculation. In this case, the first element of the list must be the
actual function, which contains certain parameters for
specifying varying factors, while the rest of the elements should contain the
various argument values for these parameters of the function as named elements
of the list (e.g., list(my_function, factor1=c(1, 2, 3), factor2=c(0,
5))), with the name corresponding to the parameter name in the function, and
the varying values (numbers or strings). When so specified, a separate
simulation procedure will be run for each combination of the given factors
(or, if only one factor is given, for each element of that factor). The
POSSA::pow function will be able to automatically
detect (by default) the factors generated this way in the present
POSSA::sim function, in order to calculate power
separately for each factor combination.
pow
# below is a (very) minimal example
# for more, see the vignettes via https://github.com/gasparl/possa#usage
# create sampling function
customSample = function(sampleSize) {
list(
sample1 = rnorm(sampleSize, mean = 0, sd = 10),
sample2_h0 = rnorm(sampleSize, mean = 0, sd = 10),
sample2_h1 = rnorm(sampleSize, mean = 5, sd = 10)
)
}
# create testing function
customTest = function(sample1, sample2_h0, sample2_h1) {
c(
p_h0 = t.test(sample1, sample2_h0, 'less', var.equal = TRUE)$p.value,
p_h1 = t.test(sample1, sample2_h1, 'less', var.equal = TRUE)$p.value
)
}
# run simulation
dfPvals = sim(
fun_obs = customSample,
n_obs = 80,
fun_test = customTest,
n_iter = 1000
)
# get power info
pow(dfPvals)
Run the code above in your browser using DataLab