This function is a smart wrapper for the hhg.univariate.ind.combined.test
function, with parameters optimized for a large number of observations.
The function first calls hhg.univariate.ind.stat
to compute the vector of test statistics. Test statistics are the sum of log-likelihood
scores, for All Derived Partitions (ADP) of the data (computed as explained in Heller et al. (2014)).
For the 'ADP-EQP-ML'
variant, the base test statistics are:
\(S_{2X2}, S_{2X3} ,S_{3X2}, ... ,S_{mmax X mmax}\).
For the 'ADP-EQP'
varint, only the sum of symmetric tables (same number of cell on both axis) is considered:
\(S_{2X2}, S_{3X3} ,S_{4X4}, ... ,S_{mmax X mmax}\)
Other variant are described in hhg.univariate.ind.combined.test
. The above varaiants are the ones to be used for a large number of observations (n>100).
Test functions are capable of handling large datasets by attempting a split only every \(N/nr.atoms\) observations. An atom is a sequence of observations which cannot be split when performing a partition of the data (i.e. setting nr.atoms
, the number of sequences which cannot be split, sets the number of equidistant partition points). For the above variants, 'EQP' stands for equipartition over atoms. Brill (2016) suggests a minimum of 40 atoms, with an increase of up to 60 for alternatives which are more difficult to detect (on the expense of computational complexity. Algorithm complexity is O(nr.atoms^4)). Very few alternatives require over 80 atoms.
The vector of \(S_{mXl}\) statistics is then combined according to the method suggested in Heller et al. (2014). The default combining type in the minimum p-value, so the test statistic is the minimum p-value over the range of partition sizes m from mmin
to mmax
, where the p-value for a fixed partition size m is defined by the aggregation type and score type. The combination is done over the statistics computed by hhg.univariate.ind.stat
. The second type of combination method for statistics, is via a Fisher type statistic, \(-\Sigma log(p_m)\) (with the sum going from \(mmin\) to \(mmax\)). The returned result may include the test statistic for the MinP
combination, the Fisher
combination, or both (see comb.type
).
If the argument NullTable
is supplied with a proper null table (constructed using
Fast.independence.test.nulltable
or hhg.univariate.ind.nulltable
, for the data sample size), test parameters are taken from NullTable
( mmax, mmin, variant, score.type, nr.atoms
,...). If NullTable
is left NULL
, a null table is generated by a call to Fast.independence.test.nulltable
using the arguments supplied to this function. Null table is generated with nr.perm
repetitions. It is stored in the returned object, under generated_null_table
. When testing for multiple hypotheses, one may generate only one null table (using this function or Fast.independence.test.nulltable
), and use it many times (thus, substantially reducing computation time). Generated null tables hold the distribution of statistics for both combination types, (comb.type=='MinP'
and comb.type=='Fisher'
).
Null tables may be compressed, using the compress
argument. For each of the partition sizes (i.e. m
or mXm
), the null distribution is held at a compress.p0
resolution up to the compress.p
percentile. Beyond that value, the distribution is held at a finer resolution defined by compress.p1
(since higher values are attained when a relation exists in the data, this is required for computing the p-value accurately.)