SourceSet (version 0.1.1)

infoSource: Get summary statistics on graphs and variables

Description

The infoSource function provides a summary of the results by focusing on either variables or graphs.

Usage

infoSource(sourceObj, map.name.variable = NULL, method = "fdr")

Arguments

sourceObj

a SourceSetObj object, i.e. the output of the sourceSet function

map.name.variable

a list of customized labels to be associated with the names of the genes. Each list element must contain only one value (i.e. the new label), and the name of each element must be associated with the names of the genes given as input to the sourceSet function (column names of data input argument). If a label is not mapped, the original name is used

method

correction method for p-values calculated on graphs. The adjustment methods allowed are: fdr (default), holm, hochberg, hommel, bonferroni, BH, BY or none. For more details refer to p.adjust.

Value

The function guides the user in identifying interesting variables returning two objects:

  • graph: a dataframe that summirizes the results of the individual input graphs, composed as follows:

    • n.primary: number of genes belonging to the source set;

    • n.secondary: number of genes belonging to the secondary set;

    • n.graph: number of genes within the graph;

    • n.cluster: number of connected components of the graph;

    • primary.impact: relative size of the estimated source set. This index quantifies the proportion of the graph impacted by primary dysregulation;

    • total.impact: relative size of the set of genes impacted by dysregulation. This index quantifies the proportion of the graph impacted by either primary or secondary dysregulation;

    • adj.pvalue: multiplicity adjusted p-value for the hypothesis of equality of the two distributions associated to the given graph

  • variable: a dataframe that summarized the results of the individual variables, composed as follows:

    • n.primary: number of input graphs in which the gene appears in the associated source set;

    • n.secondary: number of input graphs in which the gene appears in the associated secondary set;

    • n.graph: number of pathways in which the gene is annotated;

    • specificity: percentage of input graphs containing the given genes with respect to the total number of input graphs;

    • primary.impact: percentage of input graphs, such that the given gene belongs to their estimated source set, with respect to the total number of input graphs in which the gene appears;

    • total.impact: percentage of input graphs, such that the given gene is affected by some form of dysregulation in the considered graph, with respect to the total number of input graphs in which the gene appears;

    • relevance: percentage of the input graphs such that the given variable belongs to their estimated source set, with respect to the total number of input graphs. It is a general measure of the importance of the gene based on the chosen pathways;

    • score: a number ranging from 0 (low significance) to +Inf (maximal significance), computed as the combination of the p-values of all components (of all the input graphs) containing the given variable

Examples

Run this code
# NOT RUN {
## Load the SourceSetObj obtained from the source set analysis of ALL dataset

# see vignette for more details
print(load(file=system.file("extdata","ALLsourceresult.RData",package = "SourceSet")))
class(results.all)

info.all<-infoSource(sourceObj = results.all)
## results of individual input graphs
info.all$graph

## results of individual variables
# ..that appear in more than one graph and with relevance>0
info.all.genes<-info.all$variable[info.all$variable$n.graph>1 & info.all$variable$relevance>0,]
# ..ordered by score
ind.ord<-order(info.all.genes$relevance,decreasing = TRUE)
info.all.genes[ind.ord,]
# }

Run the code above in your browser using DataLab