metaAnalysis: Meta-analysis of binary and continuous variables

Description

This is a meta-analysis complement to functions standardScreeningBinaryTrait and standardScreeningNumericTrait. Given expression (or other) data from multiple independent data sets, and the corresponding clinical traits or outcomes, the function calculates multiple screening statistics in each data set, then calculates meta-analysis Z scores, p-values, and optionally q-values (False Discovery Rates). Three different ways of calculating the meta-analysis Z scores are provided: the Stouffer method, weighted Stouffer method, and using user-specified weights.

Usage

metaAnalysis(multiExpr, multiTrait, 
             binary = NULL, 
             metaAnalysisWeights = NULL, 
             corFnc = cor, corOptions = list(use = "p"), 
             getQvalues = FALSE, 
             getAreaUnderROC = FALSE,
             useRankPvalue = TRUE,
             rankPvalueOptions = list(),
             setNames = NULL, 
             kruskalTest = FALSE, var.equal = FALSE, 
             metaKruskal = kruskalTest, na.action = "na.exclude")

Arguments

multiExpr

Expression data (or other data) in multi-set format (see checkSets). A vector of lists; in each list there must be a component named data whose content is a matrix or dataframe or array of dimension 2.

multiTrait

Trait or ourcome data in multi-set format. Only one trait is allowed; consequesntly, the data component of each component list can be either a vector or a data frame (matrix, array of dimension 2).

binary

Logical: is the trait binary (TRUE) or continuous (FALSE)? If not given, the decision will be made based on the content of multiTrait.

metaAnalysisWeights

Optional specification of set weights for meta-analysis. If given, must be a vector of non-negative weights, one entry for each set contained in multiExpr.

corFnc

Correlation function to be used for screening. Should be either the default cor or its robust alternative, bicor.

corOptions

A named list giving extra arguments to be passed to the correlation function.

getQvalues

Logical: should q-values (FDRs) be calculated?

getAreaUnderROC

Logical: should area under the ROC be calculated? Caution, enabling the calculation will slow the function down considerably for large data sets.

useRankPvalue

Logical: should the rankPvalue function be used to obtain alternative meta-analysis statistics?

rankPvalueOptions

Additional options for function rankPvalue. These include na.last (default "keep"), ties.method (default "average"), calculateQvalue (default copied from input getQvalues), and pValueMethod (default "all"). See the help file for rankPvalue for full details.

setNames

Optional specification of set names (labels). These are used to label the corresponding components of the output. If not given, will be taken from the names attribute of multiExpr. If names(multiExpr) is NULL, generic names of the form Set_1, Set2, ... will be used.

kruskalTest

Logical: should the Kruskal test be performed in addition to t-test? Only applies to binary traits.

var.equal

Logical: should the t-test assume equal variance in both groups? If TRUE, the function will warn the user that the returned test statistics will be different from the results of the standard t.test function.

metaKruskal

Logical: should the meta-analysis be based on the results of Kruskal test (TRUE) or Student t-test (FALSE)?

na.action

Specification of what should happen to missing values in t.test.

Value

Data frame with the following components:

Identifier of the input genes (or other variables)

Z.equalWeights

Meta-analysis Z statistics obtained using Stouffer's method with equal weights

p.equalWeights

p-values corresponding to Z.Stouffer.equalWeights

q.equalWeights

q-values corresponding to p.Stouffer.equalWeights, only present if getQvalues is TRUE.

Z.RootDoFWeights

Meta-analysis Z statistics obtained using Stouffer's method with weights given by the square root of the number of (non-missing) samples in each data set

p.RootDoFWeights

p-values corresponding to Z.DoFWeights

q.RootDoFWeights

q-values corresponding to p.DoFWeights, only present if getQvalues is TRUE.

Z.DoFWeights

Meta-analysis Z statistics obtained using Stouffer's method with weights given by the number of (non-missing) samples in each data set

p.DoFWeights

p-values corresponding to Z.DoFWeights

q.DoFWeights

q-values corresponding to p.DoFWeights, only present if getQvalues is TRUE.

Z.userWeights

Meta-analysis Z statistics obtained using Stouffer's method with user-defined weights. Only present if input metaAnalysisWeights are present.

p.userWeights

p-values corresponding to Z.userWeights

q.userWeights

q-values corresponding to p.userWeights, only present if getQvalues is TRUE.

The next set of columns is present only if input useRankPvalue is TRUE and contain the output of the function rankPvalue with the same column weights as the above meta-analysis. Depending on the input options calculateQvalue and pValueMethod in rankPvalueOptions, some columns may be missing. The following columns are calculated using equal weights for each data set.

pValueExtremeRank.equalWeights

This is the minimum between pValueLowRank and pValueHighRank, i.e. min(pValueLow, pValueHigh)

pValueLowRank.equalWeights

Asymptotic p-value for observing a consistently low value across the columns of datS based on the rank method.

pValueHighRank.equalWeights

Asymptotic p-value for observing a consistently low value across the columns of datS based on the rank method.

pValueExtremeScale.equalWeights

This is the minimum between pValueLowScale and pValueHighScale, i.e. min(pValueLow, pValueHigh)

pValueLowScale.equalWeights

Asymptotic p-value for observing a consistently low value across the columns of datS based on the Scale method.

pValueHighScale.equalWeights

Asymptotic p-value for observing a consistently low value across the columns of datS based on the Scale method.

qValueExtremeRank.equalWeights

local false discovery rate (q-value) corresponding to the p-value pValueExtremeRank

qValueLowRank.equalWeights

local false discovery rate (q-value) corresponding to the p-value pValueLowRank

qValueHighRank.equalWeights

local false discovery rate (q-value) corresponding to the p-value pValueHighRank

qValueExtremeScale.equalWeights

local false discovery rate (q-value) corresponding to the p-value pValueExtremeScale

qValueLowScale.equalWeights

local false discovery rate (q-value) corresponding to the p-value pValueLowScale

qValueHighScale.equalWeights

local false discovery rate (q-value) corresponding to the p-value pValueHighScale

...

Analogous columns calculated by weighting each input set using the square root of the number of samples, number of samples, and user weights (if given). The corresponding column names carry the suffixes RootDofWeights, DoFWeights, userWeights.

The following columns contain results returned by standardScreeningBinaryTrait or standardScreeningNumericTrait (depending on whether the input trait is binary or continuous).

For binary traits, the following information is returned for each set:

corPearson.Set_1, corPearson.Set_2,...

Pearson correlation with a binary numeric version of the input variable. The numeric variable equals 1 for level 1 and 2 for level 2. The levels are given by levels(factor(y)).

t.Student.Set_1, t.Student.Set_2, ...

Student t-test statistic

pvalueStudent.Set_1, pvalueStudent.Set_2, ...

two-sided Student t-test p-value.

qvalueStudent.Set_1, qvalueStudent.Set_2, ...

(if input qValues==TRUE) q-value (local false discovery rate) based on the Student T-test p-value (Storey et al 2004).

foldChange.Set_1, foldChange.Set_2, ...

a (signed) ratio of mean values. If the mean in the first group (corresponding to level 1) is larger than that of the second group, it equals meanFirstGroup/meanSecondGroup. But if the mean of the second group is larger than that of the first group it equals -meanSecondGroup/meanFirstGroup (notice the minus sign).

meanFirstGroup.Set_1, meanSecondGroup.Set_2, ...

means of columns in input datExpr across samples in the second group.

SE.FirstGroup.Set_1, SE.FirstGroup.Set_2, ...

standard errors of columns in input datExpr across samples in the first group. Recall that SE(x)=sqrt(var(x)/n) where n is the number of non-missing values of x.

SE.SecondGroup.Set_1, SE.SecondGroup.Set_2, ...

standard errors of columns in input datExpr across samples in the second group.

areaUnderROC.Set_1, areaUnderROC.Set_2, ...

the area under the ROC, also known as the concordance index or C.index. This is a measure of discriminatory power. The measure lies between 0 and 1 where 0.5 indicates no discriminatory power. 0 indicates that the "opposite" predictor has perfect discriminatory power. To compute it we use the function rcorr.cens with outx=TRUE (from Frank Harrel's package Hmisc).

nPresentSamples.Set_1, nPresentSamples.Set_2, ...

number of samples with finite measurements for each gene.

If input kruskalTest is TRUE, the following columns further summarize results of Kruskal-Wallis test:

stat.Kruskal.Set_1, stat.Kruskal.Set_2, ...

Kruskal-Wallis test statistic.

stat.Kruskal.signed.Set_1, stat.Kruskal.signed.Set_2,...

(Warning: experimental) Kruskal-Wallis test statistic including a sign that indicates whether the average rank is higher in second group (positive) or first group (negative).

pvaluekruskal.Set_1, pvaluekruskal.Set_2, ...

Kruskal-Wallis test p-value.

qkruskal.Set_1, qkruskal.Set_2, ...

q-values corresponding to the Kruskal-Wallis test p-value (if input qValues==TRUE).

Z.Set1, Z.Set2, ...

Z statistics obtained from pvalueStudent.Set1, pvalueStudent.Set2, ... or from pvaluekruskal.Set1, pvaluekruskal.Set2, ..., depending on input metaKruskal.

For numeric traits, the following columns are returned:

cor.Set_1, cor.Set_2, ...

correlations of all genes with the trait

Z.Set1, Z.Set2, ...

Fisher Z statistics corresponding to the correlations

pvalueStudent.Set_1, pvalueStudent.Set_2, ...

Student p-values of the correlations

qvalueStudent.Set_1, qvalueStudent.Set_1, ...

(if input qValues==TRUE) q-values of the correlations calculated from the p-values

AreaUnderROC.Set_1, AreaUnderROC.Set_2, ...

area under the ROC

nPresentSamples.Set_1, nPresentSamples.Set_2, ...

number of samples present for the calculation of each association.

Details

The Stouffer method of combines Z statistics by simply taking a mean of input Z statistics and multiplying it by sqrt(n), where n is the number of input data sets. We refer to this method as Stouffer.equalWeights. In general, a better (i.e., more powerful) method of combining Z statistics is to weigh them by the number of degrees of freedom (which approximately equals n). We refer to this method as weightedStouffer. Finally, the user can also specify custom weights, for example if a data set needs to be downweighted due to technical concerns; however, specifying own weights by hand should be done carefully to avoid possible selection biases.

References

For Stouffer's method, see

Stouffer, S.A., Suchman, E.A., DeVinney, L.C., Star, S.A. & Williams, R.M. Jr. 1949. The American Soldier, Vol. 1: Adjustment during Army Life. Princeton University Press, Princeton.

A discussion of weighted Stouffer's method can be found in