powerComp: Comparison of the power of TSS under different models

Description

powerComp - Compare the power of a set of TSS to identify trees generated under different alternative models given a null model.

powerComp_RegAcc - Compare the power of a set of TSS to identify trees generated under different alternative models given a the region(s) of acceptance.

Usage

powerComp(
  tss,
  null_model = "yule",
  alt_models,
  n,
  distribs = "exact_if_possible",
  N_null = 10000L,
  N_alt = 1000L,
  test_type = "two-tailed",
  correction = "small-sample",
  sig_lvl = 0.05
)
powerComp_RegAcc(
  tss,
  accept_regions,
  null_model,
  alt_models,
  n,
  distribs = "exact_if_possible",
  N_null = 10000L,
  N_alt = 1000L,
  test_type = "two-tailed",
  correction = "small-sample",
  sig_lvl = 0.05
)

Value

powerComp Returns an object of class 'poweRbal_data' which is a list containing the following objects:

power: Numeric matrix containing the power values (one row per TSS and one column per alternative model).
accept_regions: Numeric matrix containing the information on the region of acceptance (one row per TSS and four columns).
CIradius: Numeric matrix containing the confidence interval radii (one row per TSS and one column per alternative model).
actual_sample_sizes: Numeric vector containing the actual sample sizes under each alternative model as some models do not always successfully generate trees.
other input data.

powerComp_RegAcc Returns an object of class 'poweRbal_data' similar to powerComp.

Arguments

tss

Vector containing the names (as character) of the tree shape statistics that should be compared. You may either use the short names provided in tssInfo to use the already included TSS, or use the name of a list object containing similar information as the entries in tssInfo. Example:
Use "new_tss" as the name for the list object new_tss containing at least the function new_tss$func = function(tree){...}, and optionally also the information new_tss$short, new_tss$simple, new_tss$name, new_tss$type, new_tss$only_binary, and new_tss$safe_n.

null_model

The null model that is to be used to determine the power of the tree shape statistics. In general, it must be a function that produces rooted binary trees in phylo format.
If the respective model is included in this package, then specify the model and its parameters by using a character or list. Available are all options listed under parameter tm in the documentation of function genTrees (type ?genTrees).
If you want to include your own tree model, then use the name of a list object containing the function (dependent on one parameter n). Example:
Use "new_tm" for the list object
new_tm <- list(func = function(n, Ntrees){...}).

alt_models

List containing the alternative models that are to be used to determine the power of the tree shape statistics. Functions that produce rooted binary trees in phylo format. The information of each single model must be in the format described for null_model.

n

Integer value that specifies the desired number of leaves, i.e., vertices with in-degree 1 and out-degree 0.

distribs

Determines how the distributions (and with that the bounds of the critical region) are computed. Available are:

"exact_if_possible" (default): Tries to compute the exact distribution under the null model if possible. Currently, this is only implemented for null_model = "yule", "pda", or "etm", and n<=20. In all other cases the distribution is approximated by sampling N_null many trees under the null model as in the option "sampled" below.
"sampled": N_null many trees are sampled under the null model to approximate the distribution.

N_null

Sample size (integer >=10) if distributions are sampled (default = 10000L).

N_alt

Sample size (integer >=10) for the alternative models to estimate the power (default = 1000L).

test_type

Determines the method. Available are:

"two-tailed" (default): The lower and upper bound of the region of acceptance are determined based on the (empirical) distribution function such that P(TSS < lower bound) <= sig_lvl/2 and P(TSS > upper bound) <= sig_lvl/2. See parameter correction for specifying how conservative the test should be: the null hypothesis can either be rejected only if the values are strictly outside of this region of acceptance (can be too conservative) or it can also be rejected (with certain probabilities) if the value equals the lower or upper bound.
"two-tailed-unbiased": Experimental - Use with caution!
The region of acceptance is optimized to yield an unbiased test, i.e., a test that identifies non-null models with a probability of at least sig_lvl. The region of acceptance is determined similar to the default method. However, it need not be symmetrical, i.e., not necessarily cutting off sig_lvl/2 on both sides. Also see parameter correction for specifying how conservative the test should be.

correction

Specifies the desired correction method. Available are:

"small-sample" (default): This method tries to ensure that the critical region, i.e. the range of values for which the null hypothesis is rejected, is as close to sig_lvl as possible (compared with "none" below, which can be too conservative). The idea is that the null hypothesis is also rejected with certain probabilities if the value matches the value of a quantile.
"none": No correction method is applied. With that the test might be slightly too conservative as the null hypothesis is maintained if the values is >= the lower and <= the upper quantile.

sig_lvl

Level of significance (default = 0.05, must be >0 and <1).

accept_regions

Numeric matrix (one row per TSS) with two or four columns: The first two columns contain the interval limits of the region of acceptance, i.e., we reject the null hypothesis for values strictly outside of this interval. The third and fourth columns contain the probabilities to reject the null hypothesis if values equal the lower or upper bound, respectively. If the last two columns are missing they are interpreted as zeroes. See return value of getAccRegion().

References

S. J. Kersting, K. Wicke, and M. Fischer. Tree balance in phylogenetic models. arXiv:2406.05185, 2024.

Examples

Run this code

powerComp(tss = c("Sackin", "Colless", "B1I"),
          alt_models = list(list("aldous",-1), "pda", "etm"), n = 10L,
          distribs = "sampled", N_null = 40L, N_alt = 20L)
powerComp_RegAcc(tss = c("Sackin", "Colless", "B1I"),
          accept_regions = getAccRegion(tss = c("Sackin", "Colless", "B1I"),
                                        n = 6L, null_model = "etm",
                                        N_null = 20L, distribs = "sampled"),
          null_model = "etm", distribs = "sampled",
          alt_models = list(list("aldous",-1), "pda", "yule"), n = 6L,
          N_null = 20L, N_alt = 20L)

Run the code above in your browser using DataLab