Learn R Programming

crosstalkr

R package for the analysis of biological networks.

Crosstalkr provides a unified toolkit for drug target and disease subnetwork identification. Crosstalkr enables users to download and leverage high-quality protein-protein interaction networks from online repositories. Users can then filter these large networks into manageable subnetworks. Finally, users can perform in-silico repression experiments to assess the relative importance of nodes in their network.

PPI ingestion and customization

Crosstalkr allows direct access to the STRINGDB and Biogrid PPI resources. Thanks to integration with stringdb, users can evaluate biological networks from 1540 different species. For a list of supported species - just call:

crosstalkr::supported_species()

Users can also take the intersection or union of the stringdb and biogrid PPIs.

Graph Filtering Methods

Crosstalkr faciliates graph reduction based on node value ranks. Node values can be provided by the user (as in gfilter.value). Users can also specify any method found in the igraph package that generates node values (i.e. igraph::degree or igraph::betweenness). We also provide a custom method for node ranking that we developed in our lab, termed network potential (gfilter.np) Crosstalkr provides a general implementation of a random-walk with restarts on graph structured data. We also provide user-friendly implementations of the common use-case of using random-walk with restarts to identify subnetworks of biological protein-protein interaction databases (adapted from the method described here - https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000639). Given a user-defined set of seed proteins, the main compute_crosstalk function will compute affinity scores for all other proteins in the network. It will then compute a null distribution using a permutation test and compare the computed affinity scores to the null distribution to identify proteins with a statistically significant association to the user-defined seed-proteins.

Node ranking via in-silico repression

In silico repression is implemented by the node_repression function. Users must specify a state function that scores nodes. Each node in v_rm, will be systematically removed from the network. The provide state function will be applied to re-calculate network state and then the difference in total state value (sum of all nodes) will be computed.

See https://journals.plos.org/ploscompbiol/article/comments?id=10.1371/journal.pcbi.1008755 for more details on in-silico repression.

Use

To install, use the following code:

install.packages("crosstalkr")

For the latest development version:

install.packages("remotes") #can skip if already installed 
remotes::install_github("https://github.com/DavisWeaver/crosstalkr")

Given a set of user-provided set of seeds, crosstalkr will identify enriched an enriched subgraph where all nodes have a high affinity for the provided seeds.

crosstalkr is optimized for use with the human cell signaling network. For example, running the code below will return a dataframe containing the user-provided seeds as well as all other proteins in the human protein-protein interaction network with a statistically significant association to these genes.

compute_crosstalk(c("EGFR", "KRAS"))

Users can use any other kind of graph-structured data, provided they are stored in an igraph object. For example:

g <- igraph::sample_gnp(n = 1000, p = 10/1000)
compute_crosstalk(c(1,3,5,8,10), g = g, use_ppi = FALSE)

We also provide rudimentary plotting functions to allow users to quickly assess the identified subnetworks:

ct_df <- compute_crosstalk(c("EGFR", "KRAS"))
plot_ct(ct_df)

A more detailed overview of the available functionality is provided in the introductory vignette (under development).

vignette(package = "crosstalkr")

Please use the provided biorxiv pre-print to cite. https://www.biorxiv.org/content/10.1101/2023.03.07.531526v1

Contact

Please feel free to submit issues here. You can also contact me at davis.weaver@case.edu if you have any questions.

Copy Link

Version

Install

install.packages('crosstalkr')

Monthly Downloads

608

Version

1.0.4

License

GPL (>= 3)

Maintainer

Davis Weaver

Last Published

September 29th, 2023

Functions in crosstalkr (1.0.4)

get_topn

Helper function for compute_null_dnp - returns the top n genes by dnp for each sample
compute_dnp

main function to compute delta np for every gene in a given dataframe - assumes compute_np has already been run for a given dataset
calc_dnp_i

helper function to calculate dnp for one sample
experiment_breakout

helper function to split experiment names into constituent parts
combine_null

.combine function for compute_null foreach looping structure
fcalc_np_all

Function to calculate the network potential for vertices v
compute_crosstalk

Identify proteins with a statistically significant relationship to user-provided seeds.
tidy_expression

helper function to convert expression matrix to tidy dataframe (if not already)
supported_species

returns a dataframe with information on supported species
is_entrez

Determine if a character vector contains entrez gene_ids
prep_biogrid

Prepare biogrid for use in analyses
is_ensembl

Determine if a character vector contains ensembl gene_ids
gfilter

Generic function to filter either an igraph object or a PPI network
load_ppi

Helper function to load requested PPI w/ parameters
ppi_union

Function to allow users to choose the union of stringdb and biogrid Only works with the human PPI. min_score parameter only applies to strindb
compute_null_dnp

function to compute null distribution of dnp
crosstalk_subgraph

Helper function to generate subgraph from crosstalk_df output of compute_crosstalk
match_seeds

Identify random sets of seeds with similar degree distribution to parent seed proteins
node_repression

Function to eliminate a node from a network g and calculate the change in some measure of network state
get_neighbors

function to get graph neighbors (along with their expression values) for a given gene in a given network g
get_random_graph

Helper function for compute_null_dnp - returns a graph with randomly permuted edges.
sparseRWR

Perform random walk with repeats on a sparse matrix
norm_colsum

Function to normalize adjacency matrix by dividing each value by the colsum.
ensembl_type

Determine if ensembl id is a Protein, gene, or transcript_id
check_crosstalk

Check to make sure incoming object is a valid crosstalk df.
gfilter.value

Method to filter graph based on user provided value
dist_calc

Internal function that computes the mean/stdev for each gene from a wide-format data frame.
compute_np

main function to compute np from a user-provided expression matrix.
prep_stringdb

Prepare Stringdb for use in analyses
gfilter.np

Method to filter graph based on network potential values.
final_dist_calc

Internal function that computes the mean/stdev for each gene from a wide-format data frame.
to_taxon_id

helper to convert user-inputs to ncbi reference taxonomy.
final_combine

final .combine function to run in compute_null_dnp foreach looping structure
ppi_intersection

Function to allow users to choose the intersection of stringdb and biogrid Only works with the human PPI. min_score parameter only applies to strindb
plot_ct

Plot subnetwork identified using the compute_crosstalk function
calc_np_all

function to calculate the network potential for each protein in a user-provided vector - cpp internal version
bootstrap_null

Bootstrap null distribution for RWR
add_value

Attach a generic user-provided value to graph
calc_np_i

helper function to calculate np for one sample
add_expression

attach expression values from user-provided expression vector to graph.
as_gene_symbol

Convert from most other representations of gene name to gene.symbol
calc_np_all_legacy

function to calculate the network potential for each protein in a user-provided vector
calc_np

calculate network potential for one node.
crosstalkr

crosstalkr: A package for the identification of functionally relevant subnetworks from high-dimensional omics data.
gfilter.ct

Method to filter the graph based on parameters passed to compute_crosstalk
detect_inputtype

Determine which format of gene is used to specify by user-defined seed proteins
gfilter.igraph_method

Method to filter graph based on any igraph method that scores verticies.