Learn R Programming

GraphRankTest (version 0.1)

RISE: Rank-based Two-Sample Tests on Similarity / Graph-Induced Rank Matrices

Description

RISE constructs a nonnegative, symmetric rank/graph matrix \(R\) from two samples \(X\) and \(Y\) (or from a pre-computed similarity matrix \(S\)), then computes a Hotelling-type quadratic statistic with an asymptotic chi-square p-value. Optionally, a permutation p-value is returned.

Usage

RISE(
  X = NULL,
  Y = NULL,
  S = NULL,
  sample1ID = NULL,
  sample2ID = NULL,
  k = 10,
  rank.type = "RgNN",
  perm = 0
)

Value

A list with components:

  • test.statistic: quadratic form \(T_R\).

  • pval.approx: asymptotic p-value (chi-square, df = 2).

  • pval.perm: permutation p-value (present only if perm > 0).

Arguments

X

Numeric matrix of size \(m \times p\) (first sample). Optional if S is supplied.

Y

Numeric matrix of size \(n \times p\) (second sample). Optional if S is supplied.

S

Numeric similarity matrix of size \(N \times N\) with \(N=m+n\) (larger values indicate greater similarity). If X and Y are provided, S is constructed internally as -dist(rbind(X, Y)).

sample1ID

Integer indices (length \(m\)) for sample \(X\) in S. Ignored if X and Y are given.

sample2ID

Integer indices (length \(n\)) for sample \(Y\) in S. Ignored if X and Y are given.

k

Positive integer tuning parameter. For "RgNN"/"RoNN", it is the neighborhood size in the k-nearest-neighbor graph (k-NNG); for "RgMST"/"RoMST", it controls the number of minimum-spanning-tree layers (k-MST); for "RoMDP", it specifies the number of rounds of minimum-distance non-bipartite matching (k-MDP).

rank.type

Character, one of c("RgNN","RoNN","RgMST","RoMST","RoMDP"). Prefix "Rg" denotes graph-induced ranks; prefix "Ro" denotes overall ranks obtained by ordering all selected edges. See the references for precise definitions.

perm

Integer, number of permutations for a permutation p-value (default 0).

Details

From \(S\) (or from \(X\), \(Y\)), the procedure constructs a symmetric matrix \(R\) with zero diagonal using one of the supported graph/ranking schemes. It then forms the within-group edge sums \(U_x = \sum_{i,j \in X} R_{ij}\) and \(U_y = \sum_{i,j \in Y} R_{ij}\). The expectation vector and covariance matrix of \((U_x, U_y)\) are derived under the permutation null distribution. The test statistic is $$T = ( U_x - \mu_x, U_y - \mu_y ) \Sigma^{-1} \begin{pmatrix} U_x - \mu_x \\ U_y - \mu_y \end{pmatrix},$$ where \(\mu_x, \mu_y\) are the expected values and \(\Sigma\) is the covariance matrix. Under the null hypothesis, \(T\) is asymptotically chi-square distributed with 2 degrees of freedom.

References

Zhou, D. and Chen, H. (2023). A new ranking scheme for modern data and its application to two-sample hypothesis testing. In Proceedings of the 36th Annual Conference on Learning Theory (COLT 2023), PMLR, pp. 3615–3668.

See Also

rTests.base, Cov.asy

Examples

Run this code
set.seed(1)
X <- matrix(rnorm(50*100, mean = 0), nrow=50)
Y <- matrix(rnorm(50*100, mean = 0.3), nrow=50)
# RgNN: graph-induced ranks from the k-nearest-neighbor graph
out.RgNN <- RISE(X = X, Y = Y, k = 10, rank.type = "RgNN", perm = 1000)
out.RgNN

# RoNN: overall ranks obtained by ordering edges from the k-NN graph
out.RoNN <- RISE(X = X, Y = Y, k = 10, rank.type = "RoNN", perm = 1000)
out.RoNN

# RgMST: graph-induced ranks from layered minimum spanning trees
# \donttest{
out.RgMST <- RISE(X = X, Y = Y, k = 10, rank.type = "RgMST", perm = 1000)
out.RgMST
# }

# RoMST: overall ranks obtained by ordering edges in the MST
# \donttest{
out.RoMST <- RISE(X = X, Y = Y, k = 10, rank.type = "RoMST", perm = 1000)
out.RoMST
# }

# RoMDP: overall ranks obtained by ordering edges from minimum-distance pairings
# \donttest{
out.RoMDP <- RISE(X = X, Y = Y, k = 10, rank.type = "RoMDP", perm = 1000)
out.RoMDP
# }

Run the code above in your browser using DataLab