hhg.test(Dx, Dy, ties = T, w.sum = 0, w.max = 2, nr.perm = 10000,
is.sequential = F, seq.total.nr.tests = 1,
seq.alpha.hyp = NULL, seq.alpha0 = NULL, seq.beta0 = NULL, seq.eps = NULL,
nr.threads = 0, tables.wanted = F, perm.stats.wanted = F)
i
'th and j
'th x
samples.Dx
, but for distances between y
's (the user may choose any norm when computing Dx
, Dy
).Dx
and/or Dy
exist and are to be properly handled (requires more computation).sum.chisq
statistic (must be non-negative, contribution of tables having cells with smaller values will be truncated to zero).max.chisq
statistic (must be non-negative, contribution of tables having cells with smaller values will be truncated to zero).nr.perm
permutations is performed. When this argument is TRUE
, either seq.total.nr.tests
seq.alpha.hyp
.seq.alpha.hyp
.seq.alpha.hyp
that defines the p-value regions for the side null p > seq.alpha.hyp * (1 + seq.eps)
and side alternative p < seq.alpha.hyp * (1 - seq.eps)
.sum.chisq
- sum of Pearson chi-squared statistics from the 2x2 contingency tables considered
sum.lr
- sum of liklihood ratio ("G statistic") values from the 2x2 tables
max.chisq
- maximum Pearson chi-squared statistic from any of the 2x2 tables
max.lr
- maximum G statistic from any of the 2x2 tables
If nr.perm > 0
, then estimated permutation p-values are returned as:
perm.pval.hhg.sc
perm.pval.hhg.sl
perm.pval.hhg.mc
perm.pval.hhg.ml
In order to give information that may help localize where in the support of the distributions of x
and y
there is departure from independence, if tables.wanted
is true, the 2x2 tables themselves are provided in:
extras.hhg.tbls
This is a n^2
by 4 matrix, whose columns are A11
, A12
, A21
, A22
as denoted in the original HHG paper. Row r of the matrix corresponds to S_ij in the same paper, where i = 1 + floor((r - 1) / n)
, and j = 1 + ((r - 1) %% n)
. Since S_ij
is never computed for i == j
, rows (0:(n - 1)) * n + (1:n)
contain NA
s on purpose.
Finally, as a means of estimating the null distributions of computed statistics, if perm.stats.wanted
is true, the statistics computed for every permutation of the data performed during testing is outputted as:
extras.perm.stats
A data.frame with one variable per statistic and one sample per permutation.is.sequential
, Wald's sequential test is implemented as suggested by Fay et al. (2007) in order to reduce the O(nr.perm * n^2 * log(n))
computational compelxity of the permutation test to a more managable size. Especially when faced with high multiplicity, say M
simultaneous tests, the necessary number of iterations may be quite large. For example, if it is desired to control the family-wise error rate (FWER) at level alpha
using the Bonferroni correction, one needs a p-value of alpha / M
to establish significance. This seems to suggest that the minimum number of permutations required is nr.perm = M / alpha
. However, if it becomes clear after a smaller number of permutations that the null cannot be rejected, no additional permutations are needed, and the p-value can be conservatively estimated as 1. Often, only a handful of hypotheses in a family are expected to be non-null. In this case the number of permutations for testing all hypotheses using Wald's procedure is expected to be much lower than the full M^2 / alpha
.
The target significance level of the sequential test is specified in the argument seq.alpha.hyp
. It depends on the number of hypotheses M
, and the type of multiplicity correction wanted. For the Bonferroni correction, the threshold is alpha / M
. For the less conservative procedure of Benjamini & Hochberg (1995), it is M1 * q / M
, where q
is the desired false discovery rate (FDR), and M1
is the (unknwon) number of true non-null hypotheses. Although M1
is unknown, the investigator can sometimes estimate it conservatively (e.g., if at most 0.02 of the hypotheses are expected to be non-null, set M1 = 0.02 * M
).## 1. The test of independence
## 1.1. A non-null univariate example
## Generate some data from the Circle example
n = 50
X = hhg.example.datagen(n, 'Circle')
plot(X[1,], X[2,])
## Compute distance matrices, on which the HHG test will be based
Dx = as.matrix(dist((X[1,]), diag = TRUE, upper = TRUE))
Dy = as.matrix(dist((X[2,]), diag = TRUE, upper = TRUE))
## Compute HHG statistics, and p-values using 1000 random permutations
hhg = hhg.test(Dx, Dy, nr.perm = 1000)
## Print the sum-chisq statistic and its permutation p-value
hhg$sum.chisq
hhg$perm.pval.hhg.sc
## 1.2. A null univariate example
n = 50
X = hhg.example.datagen(n, '4indclouds')
Dx = as.matrix(dist((X[1,]), diag = TRUE, upper = TRUE))
Dy = as.matrix(dist((X[2,]), diag = TRUE, upper = TRUE))
hhg = hhg.test(Dx, Dy, nr.perm = 1000)
hhg$sum.chisq
hhg$perm.pval.hhg.sc
hhg$sum.lr
hhg$perm.pval.hhg.sl
## 1.3. A multivariate example
library(MASS)
n = 50
p = 5
x = t(mvrnorm(n, rep(0, p), diag(1, p)))
y = log(x ^ 2)
Dx = as.matrix(dist((t(x)), diag = TRUE, upper = TRUE))
Dy = as.matrix(dist((t(y)), diag = TRUE, upper = TRUE))
hhg = hhg.test(Dx, Dy, nr.perm = 1000)
hhg$sum.chisq
hhg$perm.pval.hhg.sc
Run the code above in your browser using DataLab