MetaDE.rawdata: Identify differentially expressed genes by integrating multiple studies(datasets)

Description

MetaDE.rawdata Identify differentially expressed genes by integrating multiple studies(datasets).

Usage

MetaDE.rawdata(x, ind.method = c("modt", "regt", "pairedt", "F",
                 "pearsonr", "spearmanr", "logrank"), meta.method =
                 c("maxP", "maxP.OC", "minP", "minP.OC", "Fisher",
                 "Fisher.OC", "AW", "AW.OC", "roP", "roP.OC",
                 "Stouffer", "Stouffer.OC", "SR", "PR", "minMCC",
                 "FEM", "REM", "rankProd"), paired = NULL, miss.tol =
                 0.3, rth = NULL, nperm = NULL, ind.tail = "abs",
                 asymptotic = FALSE, ...)

Arguments

a list of studies. Each study is a list with components:

x: the gene expression matrix.
y: the outcome variable. For a binary outcome, 0 refers to "normal" and 1 to "diseased". For a multiple class outcome, the first leve

ind.method

a character vector to specify the statistical test to test whether there is association between the variables and the labels (i.e. genes are differentially expressed in each study). see "Details".

ind.tail

a character string specifying the alternative hypothesis, must be one of "abs" (default), "low" or "high".

meta.method

a character to specify the type of Meta-analysis methods to combine the p-values or effect sizes. See "Detials".

paired

a vector of logical values to specify that whether the design of ith study is paired or not. If the ith study is paired-design , the correponding element of paired should be TRUE otherwise FALSE.

miss.tol

The maximum percent missing data allowed in any gene (default 30 percent).

rth

this is the option for roP and roP.OC method. rth means the rth smallest p-value.

nperm

The number of permutations. If nperm is NULL,the results will be based on asymptotic distribution.

asymptotic

A logical values to specify whether the parametric methods is chosen to calculate the p-values in meta-analysis. The default is FALSE.

...

Additional arguments.

Value

A list with components:
meta.analysis
a list of the results of meta-analysis with components:
- meta.stat: the statistics for the chosen meta analysis method
- pval: the p-value for the above statistic. It is calculated from permutation.
- FDR: the p-values corrected by Benjamini-Hochberg.
- AW.weight: The optimal weight assigned to each dataset/study for each gene if the 'AW' or 'AW.OC' method was chosen.
ind.statthe statistics calculated from individual analysis. This is for meta.method expecting "REM","FEM","minMCC" and "rankProd".
ind.pthe p-value matrix calculated from individual analysis. This is for meta.method expecting "REM","FEM","minMCC" and "rankProd".
ind.ESthe effect size matrix calculated from indvidual analysis. This is only meta.method, "REM" and "FEM".
ind.Varthe corresponding variance matrix calculated from individual analysis. This is only meta.method, "REM" and "FEM".
raw.datathe raw data of your input. That's x. This part will be used for plotting.

Details

The available statistical tests for argument, ind.method, are:

"regt":Two-sample t-statistics (unequal variances).
"modt":Two-sample t-statistics with the variance is modified by adding a fudging parameter. In our algorithm, we choose the penalized t-statistics used in Efron et al.(2001) and Tusher et al. (2001). The fudge parameter s0 is chosen to be the median variability estimator in the genome.
"pairedt":Paired t-statistics for the design of paired samples.
"pearsonr":, Pearson's correlation. It is usually chosen for quantitative outcome.
"spearmanr":, Spearman's correlation. It is usually chosen for quantitative outcome.
"F":, the test is based on F-statistics. It is usually chosen where there are 2 or more classes.

The options for argument,mete.method,are listed below:

"maxP":the maximum of p value method.
"maxP.OC":the maximum of p values with one-sided correction.
"minP":the minimum of p values from "test" across studies.
"minP.OC":the minimum of p values with one-sided correction.
"Fisher":Fisher's method (Fisher, 1932),the summation of -log(p-value) across studies.
"Fisher.OC":Fisher's method with one-sided correction (Fisher, 1932),the summation of -log(p-value) across studies.
"AW":Adaptively-weighted method (Li and Tseng, 2011).
"AW.OC":Adaptively-weighted method with one-sided correction (Li and Tseng, 2011).
"SR":the naive sum of the ranks method.
"PR":the naive product of the ranks methods.
"minMCC":the minMCC method.
"FEM":the Fixed-effect model method.
"REM":the Random-effect model method.
"roP":rth p-value method.
"roP.OC":rth p-value method with one-sided correction.
"rankProd":rank Product method.

For the argument, miss.tol, the default is 30 percent. In individual analysis, for those genes with less than miss.tol *100 percent, missing values are imputed using KNN method in package,impute; for those genes with more than or equal miss.tol*100 percent missing are igmored for the further analysis. In meta-analysis, for those genes with less than miss.tol *100 percent missing,the p-values are calculated if asymptotic is TRUE.

References

Jia Li and George C. Tseng. (2011) An adaptively weighted statistic for detecting differential gene expression when combining multiple transcriptomic studies. Annals of Applied Statistics. 5:994-1019. Shuya Lu, Jia Li, Chi Song, Kui Shen and George C Tseng. (2010) Biomarker Detection in the Integration of Multiple Multi-class Genomic Studies. Bioinformatics. 26:333-340. (PMID: 19965884; PMCID: PMC2815659)

Examples

Run this code

#---example 1: Meta analysis of Differentially expressed genes between two classes----------#
# here I generate two pseudo datasets
label1<-rep(0:1,each=5)
label2<-rep(0:1,each=5)
exp1<-cbind(matrix(rnorm(5*20),20,5),matrix(rnorm(5*20,2),20,5))
exp2<-cbind(matrix(rnorm(5*20),20,5),matrix(rnorm(5*20,1.5),20,5))

#the input has to be arranged in lists
x<-list(list(exp1,label1),list(exp2,label2))

#here I used the modt test for individual study and used Fisher's method to combine results
#from multiple studies.
meta.res1<-MetaDE.rawdata(x=x,ind.method=c('modt','modt'),meta.method='Fisher',nperm=20)

#------example 2: genes associated with survival-----------#
# here I generate two pseudo datasets
exp1<-cbind(matrix(rnorm(5*20),20,5),matrix(rnorm(5*20,2),20,5))
time1=c(4,3,1,1,2,2,3,10,5,4)
event1=c(1,1,1,0,1,1,0,0,0,1)
exp2<-cbind(matrix(rnorm(5*20),20,5),matrix(rnorm(5*20,1.5),20,4))
time2=c(4,30,1,10,2,12,3,10,50)
event2=c(0,1,1,0,0,1,0,1,0)

#again,the input has to be arranged in lists
test2 <-list(list(x=exp1,y=time1,censoring.status=event1),list(x=exp2,y=time2,censoring.status=event2))

#here I used the log-rank test for individual study and used Fisher's method to combine results
#from multiple studies.
meta.res2<-MetaDE.rawdata(x=test2,ind.method=c('logrank','logrank'),meta.method='Fisher',nperm=20)

#------example 3: Fixed effect model for two studies from paired design-----------#
label1<-rep(0:1,each=5)
label2<-rep(0:1,each=5)
exp1<-cbind(matrix(rnorm(5*20),20,5),matrix(rnorm(5*20,2),20,5))
exp2<-cbind(matrix(rnorm(5*20),20,5),matrix(rnorm(5*20,1.5),20,5))
x<-list(list(x=exp1,y=label1),list(x=exp2,y=label2))
test<- MetaDE.rawdata(x,nperm=1000, meta.method="FEM", paired=rep(FALSE,2))

Run the code above in your browser using DataLab