Spectral framework to detect network QTLs affecting the co-expression networks. This is the main function for snQTL test.
Given a list of expression data matrices from samples with different gentoypes, we test whether there are significant difference among three co-expression networks. Statistically, we consider the hypothesis testing task:
$$H_0: N_A = N_B = N_H,$$
where \(A,B,H\) refer to different genotypes, \(N\) refers to the adjacency matrices corresponding to the co-expression network.
We provide four options for the test statistics, composed by sparse matrix/tensor eigenvalues. We perform permutation test to obtain the empirical p-values for the hypothesis testing.
NOTE: This function is also applicable for generalized cases to compare multiple (K > 3) biological networks. Instead of separating the samples by genotypes, people can separate the samples into K groups based on other interested metrics, e.g., locations, treatments. The generalized hypothesis testing problem becomes $$H_0: N_1 = ... = N_K,$$ where \(N_k\) refers to the correlation-based network corresponding to the group k. For consistency, we stick with the original genotype-based setting in this help document. See details and examples for the generalization on the Github manual https://github.com/Marchhu36/snQTL.
snQTL_test_corrnet(
exp_list,
method = c("sum", "sum_square", "max", "tensor"),
npermute = 100,
seeds = 1:100,
stats_seed = NULL,
rho = 1000,
sumabs = 0.2,
niter = 20,
trace = FALSE,
adj.beta = -1,
tensor_iter = 20,
tensor_tol = 10^(-3),
trans = FALSE,
location = NULL
)
a list containing the following:
character, recall of the choice of test statistics
list, test result for non-permuted data, including the recall of method choices, test statistics, and decomposition components
list, test results for each permuted data, including the recall of method choices, test statistics, and decomposition components
number, the empirical p-value from permutation test
list, a list of expression data from samples with different genotypes; the dimensions for data matrices are n1-by-p, n2-by-p, and n3-by-p, respectively; see "details"
character, the choice of test statistics; see "details"
number, the number of permutations to obtain empirical p-values
vector, the random seeds for permutation; length of the vector is equal to the npermute
number, the random seed for test statistics calculation with non-permuted data
number, a large positive constant adding to the diagonal elements to ensure positive definiteness in symmetric matrix spectral decomposition
number, the number specify the sparsity level in the matrix/tensor eigenvector; sumabs
takes value between \(1/sqrt(p)\) and 1, where \(p\) is the dimension; sumabs
\(*sqrt(p)\) is the upperbound of the L1 norm of the leading matrix/tensor eigenvector (see symmPMD()
)
integer, the number of iterations to use in the PMD algorithm (see symmPMD()
)
logic variable, whether to trace the progress of PMD algorithm (see symmPMD()
)
number, the power transformation to the correlation matrices (see getDiffMatrix()
); particularly, when adj.beta=0
, the correlation matrix is used, when adj.beta<0
, the covariance matrix is used.
integer, the maximal number of iteration in SSTD algorithm (see max_iter
in SSTD()
)
number, a small positive constant for error difference to indicate the SSTD convergence (see tol
in SSTD()
)
logic variable, whether to only consider the trans-correlation (between genes from two different chromosomes or regions); see "details"
vector, the (chromosome) locations for genes if trans = TRUE
In exp_list
, the data matrices are usually ordered with marker's genotypes AA, BB, and AB.
The expression data is usually normalized. We use expression data to generate the Pearson's correlation co-expression networks.
Given the list of co-expression networks, we generate pairwise differential networks $$D_{AB} = N_A - N_B, D_{AH} = N_H - N_A, D_{BH} = N_H - N_B.$$ We use pairwise differential networks to generate the snQTL test statistics.
We provide four options of test statistics with different choices of method
:
sum, the sum of sparse leading matrix eigenvalues (sLMEs) of all pairwise differential networks:
$$Stat_sum = \lambda(D_{AB}) + \lambda(D_{AH}) + \lambda(D_{BH}),$$
where \(\lambda\) refers to the sLME operation with given sparsity level set up by sumabs
.
sum_square, the sum of squared sLMEs:
$$Stat_sumsquare = \lambda^2(D_{AB}) + \lambda^2(D_{AH}) + \lambda^2(D_{BH}).$$
max, the maximal of sLMEs:
$$Stat_max = \max(\lambda(D_{AB}), \lambda(D_{AH}), \lambda(D_{BH})).$$
tensor, the sparse leading tensor eigenvalue (sLTE) of the differential tensor:
$$Stat_tensor = \Lambda(\mathcal{D}),$$
where \(\Lambda\) refers to the sLTE operation with given sparsity level set up by sumabs
,
and \(\mathcal{D}\) is the differential tensor composed by stacking three pairwise differential networks.
Additionally, if trans = TRUE
, we only consider the trans-correlation between the genes from two different chromosomes or regions in co-expression networks.
The entries in correlation matrices \(N_{ij} = 0\) if gene i and gene j are from the same chromosome or region.
The gene location information is required if trans = TRUE
.
Hu, J., Weber, J. N., Fuess, L. E., Steinel, N. C., Bolnick, D. I., & Wang, M. (2025). A spectral framework to map QTLs affecting joint differential networks of gene co-expression. PLOS Computational Biology, 21(4), e1012953.
### artificial example
n1 = 50
n2 = 60
n3 = 100
p = 200
location = c(rep(1,20), rep(2, 50), rep(3, 100), rep(4, 30))
## expression data from null
set.seed(0416) # random seeds for example data
exp1 = matrix(rnorm(n1*p, mean = 0, sd = 1), nrow = n1)
exp2 = matrix(rnorm(n2*p, mean = 0, sd = 1), nrow = n2)
exp3 = matrix(rnorm(n3*p, mean = 0, sd = 1), nrow = n3)
exp_list = list(exp1, exp2, exp3)
result = snQTL_test_corrnet(exp_list = exp_list, method = 'tensor',
npermute = 30, seeds = 1:30, stats_seed = 0416,
trans = TRUE, location = location)
result$emp_p_value
Run the code above in your browser using DataLab