sscore: Distance-based Kernel Score Test

Description

This function test whether a metabolite-set is differential expressed using a stratified kernel-based score test.

Usage

sscore(x, y, lower, upper, m)

Arguments

numeric measurements of metabolite abundance level.

0/1 response indicating whether a subject is a case group or a control group.

lower

lower bound of the kernel parameter.

upper

upper bound of the kernel parameter.

number of grid points selected in the interval [lower, upper].

Value

A p-value indicating whether the metabolite-set is differentially expressed or not.

Details

Let x be a $p\times n$ matrix, where each column is a subject, y be a $n \times 1$ 0/1 vector indicating the group label. This function tests whether this $p$-metabolite set is differentially expressed between two groups (more details can be found in Zhan et al. (2015)). It works in the following way. A score test can be applied when the kernel parameter $\rho$ is known. First, fit the null logistic model $logit(pr(y=1))=\beta_0$ to get estimate of $\beta_0$ as $\hat{\beta_0}$. Let $\hat{\mu_0}=invlogit(\hat{\beta_0})$. Second, The $n\times n$ kernel matrix is calculated as $K(\rho)_{ij} = k(x_i,x_j,\rho)$, where $x_i$ is $i$th column in x, $k(\cdot)$ is the stratified kernel function skernel. Third, the test statistic $Q(\rho)$ is calculated as $$Q(\rho)=(y-\hat{\mu_0})^T K(\rho) (y-\hat{\mu_0}).$$ An standardized version $S(\rho)$ of $Q(\rho)$ can be calculated as $S(\rho)= [Q(\rho)-\mu_{Q}]/\sigma_{Q}$. More details can be found in Liu et al.(2008). When the kernel parameter $\rho$ is not known. Suppose it takes values in [lower, upper]. Davies (1977) and Davies (1987) proposed a test based on the process ${S(\rho), \rho \in [lower,upper]}$. This test has rejection region of the form ${\sup_{L \leq \rho \leq U} S(\rho)> c }$. Using this test, an upper-bound for the p-value is given by: $$\Phi(-M)+V \exp(\frac{1}{2}M^2)/\sqrt{8\pi},$$ where $\Phi(\cdot)$ is the cumulative distribution function of standard normal density, $M$ is the maximum of $S(\rho)$ over the range of $\rho$ and $V=|S(\rho_1)-S(lower)|+|S(\rho_2)-S(\rho_1)|+\cdots+|S(upper)-S(\rho_m)|$ is the total variation of $S(\rho)$ over the interval [lower, upper] and $\rho_1,\ldots,\rho_m$ are $m$ grid points in the interval [lower, upper].

References

Davies, R. B. (1977) Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika, 64,247-254. Davies, R. B. (1987) Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika, 74,33-43. Liu, D., Ghosh, D., & Lin, X. (2008). Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. BMC bioinformatics, 9(1), 292. Zhan, X., Patterson, A. D., & Ghosh, D. (2015). Kernel approaches for differential expression analysis of mass spectrometry-based metabolomics data. BMC Bioinformatics, 16(1), 77.

Examples

Run this code

data(hcc)
x=hcc[1:3,3:57]  ## This metabolite-set contains the first three metabolites in the hcc dataset.
y=c(rep(0,35),rep(1,20))
sscore(x,y,10^-3,10^3,10)

Run the code above in your browser using DataLab