Learn R Programming

ProActive

ProActive automatically detects regions of gapped and elevated read coverage using a 2D pattern-matching algorithm. ProActive detects, characterizes and visualizes read coverage patterns in both genomes and metagenomes. Optionally, users may provide gene annotations associated with their genome or metagenome in the form of a .gff file. In this case, ProActive will generate an additional output table containing the gene annotations found within the detected regions of gapped and elevated read coverage. Additionally, users can search for gene annotations of interest in the output read coverage plots.

Visualizing read coverage data is important because gaps and elevations in coverage can be indicators of a variety of biological and non-biological scenarios, for example-

  • Elevations and gaps in read coverage may be caused by some types of structural variants. Deletions can cause gaps while duplications can cause elevations in read coverage [1].
  • Highly active and/or abundant mobile genetic elements, like transposable elements [2] and prophage [3] for example, can create elevations in read coverage at their respective integration sites.
  • Genetic regions with high mutation rates and/or high variability within the population can generate gaps in read coverage [4].
  • Poor quality sequencing reads and chimeric reference sequences may cause gaps and elevations in read coverage.

Since the cause for gaps and elevations in read coverage can be ambiguous, ProActive is best used as a screening method to identify genetic regions for further investigation with other tools!

References:

  1. Tattini L., D’Aurizio R., & Magi A. (2015). Detection of Genomic Structural Variants from Next-Generation Sequencing Data. Frontiers in bioengineering and biotechnology, 3, 92. https://doi.org/10.3389/fbioe.2015.00092
  2. Kleiner M., Bushnell B., Sanderson K.E. et al. (2020) Transductomics: sequencing-based detection and analysis of transduced DNA in pure cultures and microbial communities. Microbiome 8, 158. https://doi.org/10.1186/s40168-020-00935-5
  3. Kieft K., Anantharaman K. (2022). Deciphering Active Prophages from Metagenomes. mSystems 7:e00084-22. https://doi.org/10.1128/msystems.00084-22
  4. Fogarty E., Moore R. (2019). Visualizing contig coverages to better understand microbial population structure. https://merenlab.org/2019/11/25/visualizing-coverages/

Input files

Pileup file:

ProActive detects read coverage patterns using a pattern-matching algorithm that operates on pileup files. A pileup file is a file format where each row summarizes the ‘pileup’ of reads at specific genomic locations. Pileup files can be used to generate a rolling mean of read coverages and associated base pair positions which reduces data size while preserving read coverage patterns. ProActive requires that input pileups files be generated using a 100 bp window/bin size.

Pileup files can be generated by mapping sequencing reads to a metagenome or genome fasta. Read mapping should be performed using a high minimum identity (0.97 or higher) and random mapping of ambiguous reads. The pileup files needed for ProActive are generated using the .bam files produced during read mapping. Some read mappers, like BBMap, allow for the generation of pileup files in the bbmap.sh command with use of the bincov output with the covbinsize=100 parameter/argument. Otherwise, BBMap’s pileup.sh can convert .bam files produced by any read mapper to pileup files compatible with ProActive using the bincov output with binsize=100.

NOTE: For detailed information on input file format, please see the vignette. Users may also use the ‘sampleMetagenomePileup’ and ‘sampleGenomePileup’ files that come pre-loaded with ProActive as a reference.

gffTSV:

ProActive optionally accepts a .gff file as input. The .gff file must be associated with the same metagenome or genome used to create your pileup file. The .gff file should be a TSV and should follow the same general format described here.

Installation

Install ProActive from CRAN with:

install.packages("ProActive")
library(ProActive)

Install the development version of ProActive from GitHub with:

if (!require("devtools", quietly = TRUE)) {
  install.packages("devtools")
}

devtools::install_github("jlmaier12/ProActive")
library(ProActive)

Quick start

library(ProActive)


## Metagenome mode

MetagenomeProActive <- ProActiveDetect(
  pileup = sampleMetagenomePileup,
  mode = "metagenome",
  gffTSV = sampleMetagenomegffTSV
)
#> Preparing input file for pattern-matching...
#> Starting pattern-matching...
#> A quarter of the way done with pattern-matching
#> Half of the way done with pattern-matching
#> Almost done with pattern-matching!
#> Summarizing pattern-matching results
#> Finding gene predictions in elevated or gapped regions of read coverage...
#> Finalizing output
#> Execution time: 2.09secs
#> 0 contigs were filtered out based on low read coverage
#> 0 contigs were filtered out based on length (< minContigLength)
#> 
#> Elevation       Gap NoPattern 
#>         3         3         1

MetagenomePlots <- plotProActiveResults(pileup = sampleMetagenomePileup,
                                        ProActiveResults = MetagenomeProActive)

MetagenomeGeneMatches <- geneAnnotationSearch(ProActiveResults = MetagenomeProActive, 
                                              pileup = sampleMetagenomePileup, 
                                              gffTSV = sampleMetagenomegffTSV,
                                              geneOrProduct = "product",
                                              keyWords = c("transport", "chemotaxis"))
#> Cleaning gff file...
#> Cleaning pileup file...
#> Searching for matching annotations...
#> 3 contigs/chunks have gene annotations that match one or more of the provided keyWords


## Genome mode

GenomeProActive <- ProActiveDetect(
  pileup = sampleGenomePileup,
  mode = "genome",
  gffTSV = sampleGenomegffTSV
)
#> Preparing input file for pattern-matching...
#> Starting pattern-matching...
#> A quarter of the way done with pattern-matching
#> Half of the way done with pattern-matching
#> Almost done with pattern-matching!
#> Summarizing pattern-matching results
#> Finding gene predictions in elevated or gapped regions of read coverage...
#> Finalizing output
#> Execution time: 29.7secs
#> 0 contigs were filtered out based on low read coverage
#> 0 contigs were filtered out based on length (< minContigLength)
#> 
#> Elevation       Gap NoPattern 
#>        25         3        21

GenomePlots <- plotProActiveResults(pileup = sampleGenomePileup,
                                    ProActiveResults = GenomeProActive)

GenomeGeneMatches <- geneAnnotationSearch(ProActiveResults = GenomeProActive, 
                                          pileup = sampleGenomePileup, 
                                          gffTSV = sampleGenomegffTSV,
                                          geneOrProduct = "product",
                                          keyWords = c("ribosomal"), 
                                          inGapOrElev = TRUE,
                                          bpRange = 5000)
#> Cleaning gff file...
#> Cleaning pileup file...
#> Searching for matching annotations...
#> 8 contigs/chunks have gene annotations that match one or more of the provided keyWords

Copy Link

Version

Install

install.packages('ProActive')

Monthly Downloads

136

Version

0.1.0

License

GPL-2

Issues

Pull Requests

Stars

Forks

Maintainer

Jessie Maier

Last Published

January 21st, 2025

Functions in ProActive (0.1.0)

sampleGenomegffTSV

sampleGenomegffTSV
geneAnnotationSearch

Search for gene annotations on classified contigs/chunks
plotProActiveResults

Plot results of `ProActive()` pattern-matching
patternMatcher

Controller function for pattern-matching
sampleMetagenomePileup

sampleMetagenomePileup
pileupFormatter

Reformat input pileup file
geneAnnotationPlot

Gene annotation plot
partialElevGapShrink

Shrink the width of partial elevation and gap patterns
sampleGenomePileup

sampleGenomePileup
patternBuilder

Builds pattern-match vectors
removeNoPatterns

Removes 'NoPattern' classifications from best match list
patternTranslator

Full elevation/gap pattern translator
sampleMetagenomeResults

sampleMetagenomeResults
sampleMetagenomegffTSV

sampleMetagenomegffTSV
classifSumm

Summarizes pattern-matching results
ProActiveDetect

Detect elevations and gaps in mapped read coverage patterns.
ProActive-package

ProActive
noPattern

No read coverage pattern
fullElevGapShrink

Shrink the width of full elevation and gap patterns
contigChunks

'chunk' long contigs
changewindowSize

Change the pileup window size
elevOrGapClassif

Classifies partial elevation/gap pattern-matches
collectBestMatchInfo

Collect information regarding the pattern-match
GPsInElevGaps

Detect gene predictions in elevations and gaps
fullElevGap

Controller function for full elevation/gap pattern-matching
genomeChunks

'chunk' genomes
partialElevGap

Controller function for partial elevation/gap pattern-matching
linkChunks

Link pattern-matches on contig/genome chunks