DoDART: Main function of DART

Description

This is the main function implementing DART. Given a data matrix and a model (pathway) signature it will construct the relevance correlation network for the genes in the signature over the data, evaluate the consistency of the correlative patterns with those predicted by the model signature, filter out the noise and finally obtain estimates of pathway activity for each individual sample. Specifically, it will call and run the following functions:

(1) BuildRN: This function builds a relevance correlation network of the model pathway signature in the data set in which the pathway activity estimate is desired. We point that this step is totally unsupervised and does not use and phenotypic information of the samples.

(2) EvalConsNet: This function evaluates the consistency of the inferred network with the prior information of the model pathway signature. The up/down regulatory pattern given by the model signature implies predictions about the directionality of the gene-gene correlations in the independent data set. For instance, if gene "A" is upregulated and gene "B" is downregulated, then assuming that the model signature has any relevance in the independent data set, we would expect genes "A" and "B" to be anti-correlated. Thus, a consistency score can be computed. Only if the consistency score is higher than the score expected by random chance is it recommended that the model signature be used to infer pathway activity.

(3) PruneNet: This function obtains the pruned, i.e consistent, network, in which any edge represents a significant correlation in gene expression whose directionality agrees with that predicted by the prior information. This is the denoising step of the algorithm. The function returns the whole pruned network and its maximally connected component.

(4) PredActScore: Given the adjacency matrix of the maximally connected consistent subnetwork and given the regulatory weights of the corresponding model pathway signature, this function estimates a pathway activation score in each sample. This function can also be used to infer pathway activity in another independent data set using the inferred subnetwork.

Before performing the pruning step, DoDART will check whether the relevance correlation network is significantly consistent with the predictions from the model signature. Significance is assessed by first computing a consistency score (in effect, the fraction of edges in the relevance network which are consistent with the model prediction) and subsequently by 1000 random permutations to obtain an empirical null distribution for the consistency score. Model signatures whose consistency scores have empirical P-values less than 0.001 are deemed consistent. If the consistency score is not significant, the function will issue a warning and it is not recommended to use the signature to predict pathway activity.

Usage

DoDART(data.m, sign.v, fdr)

Arguments

data.m

Data matrix (numeric). Rows label features, columns label samples. It is assumed that number of features is much larger than number of samples. Rownames of data.m must be valid unique gene (probe) identifiers.

sign.v

Model pathway signature vector (numeric). Elements represent the regulatory weights (i.e if up or downregulated). Names of sign.v are the gene identifiers, which must match the gene (probe) identifiers of the rows of data.m

fdr

Desired false discovery rate (numeric) which determines the allowed number of false positives in the relevance network. The default value is 0.000001. Since typically model signatures may contain on the order of 100 genes, this amounts to constructing a relevance network on the order of 10000 edges (pairwise correlations). Using a Bonferroni correction, this leads to a P-value threshold of approx. 1e-6 to 1e-7. This parameter is tunable, so choosing a number of different thresholds may be advisable.

Value

netcons: A vector summarising the properties of the correlation network and the consistency with the model pathway signature: nG is the number of genes in the signature, nE is the number of edges of the relevance network generated by function BuildRN, fE is the ratio of the number of edges in the relevance network to the maximum possible number, fconsE is the fraction of edges whose sign (i.e sign of correlation) is the same as the directionality predicted by the model signature, Pval(consist) is a p-value reflecting the significance of fconsE, and is estimated as the fraction of randomisations that yielded an average connectivity larger than the observed one.
adj: Adjacency matrix of maximally connected consistent relevance network.
sign: Model pathway signature vector of genes found in data set and in maximally connected component.
score: Predicted activation scores of the model signature in the samples of data set data.m.
degree: Degrees/connectivities of the genes in the DART network.
consist: Module consistency result, output of EvalConsNet.

References

Jiao Y, Lawler K, Patel GS, Purushotham A, Jones AF, Grigoriadis A, Ng T, Teschendorff AE. (2011) Denoising algorithm based on relevance network topology improves molecular pathway activity inference. BMC Bioinformatics 12:403.

Teschendorff AE, Gomez S, Arenas A, El-Ashry D, Schmidt M, et al. (2010) Improved prognostic classification of breast cancer defined by antagonistic activation patterns of immune response pathway modules. BMC Cancer 10:604.

Examples

Run this code

  
### Example
### load in example data:
data(dataDART);
### dataDART$data: mRNA expression data of 67 ER negative breast cancer samples.
### dataDART$pheno: 51 basals and 16 HER2+ (ERBB2+).
### dataDART$phenoMAINZ: 24 basals and 8 HER2+ (ERBB2+).
### dataDART$sign: perturbation signature of ERBB2 activation.


### Using DoDART
dart.o <- DoDART(dataDART$data,dataDART$sign,fdr=0.000001);
### check that activation is higher in HER2+ compared to basals
boxplot(dart.o$score ~ dataDART$pheno);
pv <- wilcox.test(dart.o$score ~ dataDART$pheno)$p.value;
text(x=1.5,y=3.8,labels=paste("P=",pv,sep=""));

Run the code above in your browser using DataLab