Learn R Programming

MetamapsDB

R package for querying integrated -omics database.

Introduction

MetamapsDB is a R package used for interfacing with such a database for

  1. Gene centric queries
  2. Analyses of Integrated^ Microbiome datasets

^ Genomic and Transcriptomic

It is the final step of the 5 preprocessing steps used in carrying out our gene centric pipeline. (Unpublished)

  1. Annotation DIAMOND - Labelling of short reads using, blastX-like against NR protein database (more for functional)
  2. Binning MEGAN6 CE - Functional (KEGG) binning of NGS short reads based on labels
  3. Bin-based Assembly NEWBLER - Gene Centric OLC Assembly of functional bins / KEGG Orthologs
    • Annotation DIAMOND - Labelling (round2) more for taxonomic annotation
    • MEGAN6 CE - Taxonomic binning of contigs based on labels
  4. Gene centric analyses pAss -
    • Identify Maximum Diversity Region (MDR)
    • Remove known KOs which fail process
      • Diversity analysis (gene count)
        • 31 Single Copy Genes
      • ID genera which are indistinguishable due to sequence conservation
  5. mapBlat - Maps (using BLAT) gDNA and rRNA short onto
    • the contigs
    • just the MDR Region

Mentioned Packages

OMICS

Docker wrapper for generating a KEGG + Taxnoomy + Contig neo4j graph database

MapBloat

R package for mapping reads onto contigs/MDR using Blat

mapBlat

Usage

Functions

FunctionDescription
connectConnects with Neo4J database
dbquerySends query to Neo4J database
konameTakes ko id as input and returns ko details
taxnam.sqlTakes NCBI taxonomy id as input and returns ko details
contractMetabSimplifies KEGG metabolic graph
igraph2gexfEncodes Igraph into gexf format
sigmaGraphGenerates an interactive graph representation of a subnetwork in html using the htmlwidgets package
grepgraphgiven a set of KOs get the subgrap of metabolism
grepgraph.cpdgiven a set of CPDs get the subgrap of metabolism
annotateContigs.taxonomy
buildE
buildTree
extractFromPath
findK
findSeeds
findTrios
findtype
getContigs
gi2rank
ig2ggvis
ksCal
lca
make.data.frameUtily function dbquery might return data.frame where each column is a nested list. Converts lilst to dataframe
trio
trio.local

Copy Link

Version

Install

install.packages('MetamapsDB')

Monthly Downloads

4

Version

0.0.2

License

GPL (>= 2)

Maintainer

Wesley GOI

Last Published

December 6th, 2017

Functions in MetamapsDB (0.0.2)

annotateContigs.taxonomy

Adds taxonomic annotations of the contigs
categorize

categorize
blatting

blatting
dynPlots

dynPlot Diagnostic plots
adjacentPairs

Given a igraph object, adjacentPairs finds adjacent pairs of KOs
allTrios

Find all trios surrounding the KO of interest
dynamicThreshold

dynamicThreshold tries to identify the lower bound converage value in order to remove low quality contigs. Number of contigs tend to stablize at a value when removing reads below a certain readnum and we try to identify the min read required for that region of stability
connect

Connecting to NEO4J Graph Database.
contigInfo

contigInfo
addContigProperty

addContigProperty
addKOProperty

addKOProperty
contigsSurvive.repeats.readNum

contigsSurvive.repeats.readNum
buildE

buildE sorts the tree into a data.frame
%$%

with Pipe operator
buildTree

Taxonomic tree of the taxons
%<>%

Double Pipe operator
extractFromPath

extractFromPath
findHomology

searches for KO's homology assignments
contigsSurvive.repeats.rpk

contigsSurvive.repeats.rpk
grepgraph

Returns the metabolic graph given vector of KOs
contractMetab

contractMetab shrinks a metabolic network's KOs into non-redundant units
grepgraph.cpd

Returns the metabolic graph given vector of KOs
cpdname

Finds the details of the CPD when given its ID
findK

findK to find the optimum number of Ks
findNextKO

FindNextKO
dbquery

Function for querying metamaps DB
ko2path

ko2path Finds all pathways related to the KO Finds all associated pathways, and returns a data.frame with ko and pathway details
findV

findV finds the vertixID in the graph given its name
mapContig

finds the location of the MDR on the contig
findtype

findType finds KOs/compounds ID in metabolic graph
mapReads2MDR

overlaps
igraph2gexf

Converts igraph obj two gexf Function for converting igraph 2 gexf
prettifyGraph

returns metabolic graph with all the ornaments set Adds details into the igraph object
index

Indexes the database for faster retrieval
grep.cDNA

grep.cDNA
mappingInfo

mappingInfo sequence analysis of contigs (for SIMULATION only)
grepReads

grepReads greps for cDNA or gDNA reads cause they are both grouped togehter in the assembly input
mdrRanges

Returns contig ranges for those captured within the MDR
make.data.frame

Convert columns with nested list structures into plain vectors
path2ko

Finds all KOs in a given pathway Finds all KOs belonging to a Pathways
map

map takes cDNA reads (fastQ format) and maps them onto contigs
findPerl

findPerl finds the path to the executable
pathways

List Pathways Lists all metabolic pathways
findPython

findPython finds the path to the executable
ig2ggplot

Convert igraph to ggplot2 object
grepgraph_cpd

Generates metabolic network as IGRAPH object from input vector KOs Returns the metabolic graph given vector of KOs
listquery

Function for querying metamaps DB
nitrogenMetab

nitrogenMetab
path2kingdom

List all intermediaries between taxa and the superkingdom it belongs to
lookupTable

generates `lookupTable` for filtering raw data post gene centric assembly
%>%

Pipe operator
readStatusReader

readStatusReader stores read assignment details from the simulation and assigns the contig a genome of origin
plotClassification

Plot clustering
taxname

Lists the taxonomic id's details
simpleThres

simpleThres
top500kos

Top500kos
simulated

finds all simulated genera for that particular KO
scg

singleCopyGenes'
koname

Gives KO details when supplied with KO id
findSeeds

To find all seed compounds in the metabolic graph
findTrios

findTrios searches valid three-KO reactions resticting by the center KO and reports reaction clusters
ksCal

ksCal generates KS statistics for aKO given the base distribution
lca

Finds the lowest common ancestor
rolling

rolling A dynamic threshold for the killing redundant contigs
surrNODES

surrNODES finds nodes which are surrounding the given node
taxnam.sql

taxname.sql