PCSF: Prize-collecting Steiner Forest (PCSF)

Description

PCSF returns a subnetwork obtained by solving the PCSF on the given interaction network.

Usage

PCSF(ppi, terminals, w = 2, b = 1, mu = 5e-04, dummies)

Value

The final subnetwork obtained by the PCSF. It return an igraph object with the node prize and edge cost attributes.

Arguments

ppi: An interaction network, an igraph object.
terminals: A list of terminal genes with prizes to be analyzed in the PCSF context. A named numeric vector, where terminal genes are named same as in the interaction network and numeric values correspond to the importance of the gene within the study.
w: A numeric value for tuning the number of trees in the output. A default value is 2.
b: A numeric value for tuning the node prizes. A default value is 1.
mu: A numeric value for a hub penalization. A default value is 0.0005.
dummies: A list of nodes that are to connected to the root of the tree. If missing the root will be connected to all terminals.

Author

Murodzhon Akhmedov

Details

The PCSF is a well-know problem in graph theory. Given an undirected graph G = (V, E), where the vertices are labeled with prizes $p_{v}$ and the edges are labeled with costs $c_{e} > 0$, the goal is to identify a subnetwork G' = (V', E') with a forest structure. The target is to minimize the total edge costs in E', the total node prizes left out of V', and the number of trees in G'. This is equivalent to minimization of the following objective function:

$$F(G')= Minimize \sum_{ e \in E'} c_{e} + \beta*\sum_{v \not\in V'} p_v + \omega*k$$ where, k is the number of trees in the forest, and it is regulated by parameter $\omega$. The parameter $\beta$ is used to tune the prizes of nodes.

This optimization problem nicely maps onto the problem of finding differentially enriched subnetworks in the cell protein-protein interaction (PPI) network. The vertices of interaction network correspond to genes or proteins, and edges represent the interactions among them. We can assign prizes to vertices based on measurements of differential expression, copy number, or mutation, and costs to edges based on confidence scores for those intra-cellular interactions from experimental observation, yielding a proper input to the PCSF problem. Vertices that are assigned a prize are referred to terminal nodes, whereas the vertices which are not observed in patient data are not assigned a prize and are called Steiner nodes. After scoring the interactome, the PCSF is used to detect a relevant subnetwork (forest), which corresponds to a portion of the interactome, where many genes are highly correlated in terms of their functions and may regulate the differentially active biological process of interest. The PCSF aims to identify neighborhoods in interaction networks potentially belonging to the key dysregulated pathways of a disease. In order to avoid a bias towards the hub nodes of PPI networks to appear in solution of PCSF, we penalize the prizes of Steiner nodes according to their degree distribution in PPI, and it is regulated by parameter $\mu$:

$$p'_{v} = p_{v} - \mu*degree(v)$$

The parameter $\mu$ also affects the total number of Steiner nodes in the solution. Higher the value of $\mu$ smaller the number of Steiners in the subnetwork, and vice-versa. Based on our previous analysis the recommended range of $\mu$ for biological networks is between 1e-4 and 5e-2, and users can choose the values resulting subnetworks with vertex sets that have desirable Steiner/terminal node ratio and average Steiner/terminal in-degree ratio in the template interaction network.

References

Akhmedov M., LeNail A., Bertoni F., Kwee I., Fraenkel E., and Montemanni R. (2017) A Fast Prize-Collecting Steiner Forest Algorithm for Functional Analyses in Biological Networks. Lecture Notes in Computer Science, to appear.