Perform multiscale test of independence for multivariate vectors. See vignettes for further examples.
MultiFIT(xy, x = NULL, y = NULL, p_star = NULL, R_max = NULL,
R_star = 1, rank.transform = TRUE, ranking.approximation = FALSE,
M = 10, apply.stopping.rule = FALSE, alpha = 0.05,
test.method = "Fisher", correct = TRUE, min.tbl.tot = 25L,
min.row.tot = 10L, min.col.tot = 10L, p.adjust.methods = c("H",
"Hcorrected"), compute.all.holm = TRUE, return.all.pvs = TRUE,
verbose = FALSE)
A list, whose first element corresponds to the matrix x as below, and
its second element corresponds to the matrix y as below. If xy
is not
specified, x
and y
need to be assigned.
A matrix, number of columns = dimension of random vector, number of rows = number of observations.
A matrix, number of columns = dimension of random vector, number of rows = number of observations.
Numeric, cuboids associated with tests whose p
-value is below p_star
will be halved and further tested.
A positive integer (or Inf), the maximal number of
resolutions to scan (algorithm will stop at a lower resolution if
all tables in it do not meet the criteria specified at min.tbl.tot
,
min.row.tot
and min.col.tot
)
A positive integer, if set to an integer
between 0 and R_max
, all tests up to and including resolution R_star
will be performed (algorithm will stop at a lower resolution than requested if
all tables in it do not meet the criteria specified at min.tbl.tot
,
min.row.tot
and min.col.tot
). For higher resolutions only the children of
tests with p
-value lower than p_star
will be considered.
Logical, if TRUE
, marginal rank transform is
performed on all margins of x
and y
. If FALSE
, all
margins are scaled to 0-1 scale. When FALSE
, the average and top
statistics of the negative logarithm of the p
-values are only computed
for the univariate case.
Logical, if FALSE
, select only tests with p
-values
more extreme than p_star
to halve and further test. FWER control not guaranteed.
If TRUE
, choose at each resolution the M
tests with the most extreme
p
-values to further halve and test.
A positive integer (or Inf), the number of top ranking tests to continue to split at each resolution. FWER control not guaranteed for this method.
Logical. If TRUE, an adjusted p
-value is computed for each resolution,
Numeric. Threshold below which resolution-specific p
-values trigger early stopping.
String, choose "Fisher" for Fisher's exact test (slowest), "chi.sq" for Chi-squared test, "LR" for likelihood-ratio test and "norm.approx" for approximating the hypergeometric distribution with a normal distribution (fastest).
Logical, if TRUE
compute mid-p corrected p
-values for
Fisher's exact test, or Yates corrected p
-values for the Chi-squared test,
or Williams corrected p
-values for the likelihood-ratio test.
Non-negative integer, the minimal number of observations
per table below which a p
-value for a given table will not be computed.
Non-negative integer, the minimal number of observations for row totals in the 2x2 contingency tables below which a contingency table will not be tested.
Non-negative integer, the minimal number of observations for column totals in the 2x2 contingency tables below which a contingency table will not be tested.
String, choose between "H" for Holm, "Hcorrected" for Holm with
the correction as specified in correct
.
Logical, if FALSE
, only global p
-value is
computed (may be a little faster when any tests are performed). If TRUE
adjusted p
-values are computed for all tests.
Logical, if TRUE, a data frame with all p
-values
is returned (not applicable when stopping rule is applied)
Logical.
p.values.holistic
, a named numerical vector containing the holistic p
-values of
for the global null hypothesis (i.e. x independent of y).
p.values.resolution.specific
, a named numerical vector containing the
reslution specific p
-values of for the global null hypothesis (i.e. x independent of y).
res.by.res.pvs
, a dta frame that contains the raw and Bonferroni adjusted
resolution specific p
-values.
all.pvs
, a data frame that contains all p
-values and adjusted
p
-values that are computed. Returned if return.all.pvs
is TRUE
.
all
, a nested list. Each entry is named and contains data about a resolution
that was tested. Each resolution is a list in itself, with cuboids
, a summary of
all tested cuboids in a resolution, tables
, a summary of all 2x2
contingency tables in a resolution, pv
, a numerical vector containing the
p
-values from the tests of independence on 2x2 contingency table in tables
that meet the criteria defined by min.tbl.tot
, min.row.tot
and min.col.tot
.
The length of pv
is equal to the number of rows of tables
. pv.correct
,
similar to the above pv
, corrected p
-values are computed and returned when
correct
is TRUE
. rank.tests
, logical vector that indicates
whether or not a test was ranked among the top M
tests in a resolution. The
length of rank.tests
is equal to the number of rows of tables
. parent.cuboids
,
an integer vector, indicating which cuboids in a resolution are associated with
the ranked tests, and will be further halved in the next higher resolution.
parent.tests
, a logical vector of the same length as the
number of rows of tables
, indicating whether or not a test was chosen as a parent
test (same tests may have multiple children).
# NOT RUN {
set.seed(1)
n = 300
Dx = Dy = 2
x = matrix(0, nrow = n, ncol = Dx)
y = matrix(0, nrow = n, ncol = Dy)
x[,1] = rnorm(n)
x[,2] = runif(n)
y[,1] = rnorm(n)
y[,2] = sin(5 * pi * x[ , 2]) + 1 / 5 * rnorm(n)
fit = MultiFIT(x = x, y = y, verbose = TRUE)
w = MultiSummary(x = x, y = y, fit = fit, alpha = 0.0001)
# }
Run the code above in your browser using DataLab