## A Method for 'Connecting The Dots' in Weighted Graphs

A method for pattern discovery in weighted graphs. Two use cases are achieved: 1) Given a weighted graph and a subset of its nodes, do the nodes show significant connectedness? 2) Given a weighted graph and two subsets of its nodes, are the subsets close neighbors or distant?

# CTD: an information-theoretic method to interpret multivariate perturbations in the context of graphical models with applications in metabolomics and transcriptomics

Our novel network-based approach, CTD, “connects the dots” between metabolite perturbations observed in individual metabolomics profiles and a given disease state by calculating how connected those metabolites are in the context of a disease-specific network.

## Using CTD in R.

### Installation

In R, install the devtools package, and install CTD by install_github(“BRL-BCM/CTD”).

### Look at the package Rmd vignette.

Located in /vignette/CTD_Lab-Exercise.Rmd. It will take you across all the stages in the analysis pipeline, including:

1. Background knowledge graph generation.
2. The encoding algorithm: including generating node permutations using a network walker, converting node permutations into bitstrings, and calculating the minimum encoding length between k codewords.
3. Calculate the probability of a node subset based on the encoding length.
4. Calculate similarity between two node subsets, using a metric based on mutual information.

## References

Thistlethwaite L.R., Petrosyan V., Li X., Miller M.J., Elsea S.H., Milosavljevic A. (2020). CTD: an information-theoretic method to interpret multivariate perturbations in the context of graphical models with applications in metabolomics and transcriptomics. Manuscript in review.

## Functions in CTD

 Name Description Thistlethwaite2020 Thistlethwaite et al. (2020) data.zscoreData Z-transform available data Miller2015 Miller et al. (2015) cohorts_coded Disease cohorts with coded identifiers graph.connectToExt Connect a node to its unvisited "extended" neighbors data.surrogateProfiles Generate surrogate profiles data.combineData Combine datasets data.imputeData Impute missing values graph.diffuseP1 Diffuse Probability P1 from a starting node graph.diffusionSnapShot Capture the current state of probability diffusion Wangler2017 Wangler et al. (2017) mle.getPtDist CTDncd: A network-based distance metric. stat.fishersMethod Fisher's Combined P-value mle.getPtBSbyK Generate patient-specific bitstrings mle.getMinPtDistance Get minimum patient distances singleNode.getNodeRanksN Generate single-node node rankings ("fixed" walk) stat.entropyFunction Entropy of a bit-string mle.getEncodingLength Minimum encoding length graph.netWalkSnapShot Capture the current location of a network walker multiNode.getNodeRanks Generate multi-node node rankings ("adaptive" walk) stat.getDirSim DirSim: The Jaccard distance with directionality incorporated. graph.naivePruning Network pruning for disease-specific network determination No Results!