AmpliconDuo-package: Statistical Analysis Of Amplicon Data Of The Same Sample To Identify Spurious Amplicons

Description

Increasingly powerful techniques for high-throughput sequencing open the possibility to comprehensively characterize microbial communities, including rare species. However, a still unresolved issue are the substantial error rates in the experimental process generating these sequences. To overcome these limitations we propose an approach, where each sample is split and the same amplification and sequencing protocol is applied to both halves. This procedure should allow to detect likely PCR and sequencing artifacts, and true rare species by comparison of the results of both parts.

The AmpliconDuo package, whereas ampliconduo from here on refers to the two amplicon data sets of a split sample, is intended to help interpret the obtained amplicon frequency distribution across split samples, and to filter the false positive amplicons.

Arguments

Details

Package:	AmpliconDuo
Type:	Package
Version:	1.1.1
Date:	2020-05-22
License:	GPL-2

The core of this package is the ampliconduo function, that generates for each pair of a split samples an ampliconduo data frame, while statistically analysing the data by Fisher's exact test. Ampliconduo data frames, or lists of these, are the input required for all other functions of this package.

plotAmpliconduo plots for an ampliconduo the amplicon frequencies (number of reads per amplicon) of sample A vs. amplicon frequencies of sample B, highlighting amplicons displaying a significant deviation between both samples.
plotAmpliconduo.set does the same as plotAmpliconduo but accepts a list of ampliconduo data frames and arranges the plots in a 2-dimensional array.
plotORdensity generates a histogram plot of the amplicon frequency odds ratio density for an ampliconduo data frame. For multiple data frames organizes the plots in a 2-dimentional array.
discordance.delta calculates delta (\(\Delta\)) and delta prime (\(\Delta'\)), the fraction of amplicon frequencies and amplicons, respectively, with a false discovery rate below a certain threshold \(\theta\) as a measure of discordance between two amplicon data sets A and B.
filter.ampliconduo applies filter criteria to an ampliconduo data frame deciding which amplicons are going to be rejected.
filter.ampliconduo.set same as filter.ampliconduo for a list af ampliconduo data frames.
accepted.amplicons returns the indices of those amplicons that have passed the filter criteria.

References

Lange A, Jost S, Heider D, Bock C, Budeus B, et al. (2015) AmpliconDuo: A Split-Sample Filtering Protocol for High-Throughput Amplicon Sequencing of Microbial Communities. PLOS ONE 10(11): e0141590

Examples

Run this code

# NOT RUN {
## load test amplicon frequency data ampliconfreqs and vector with sample names site.f
data(ampliconfreqs)
data(site.f)

## generating ampliconduo data frames 
## depending on the size if the data sets, may take some time
ampliconduoset <- ampliconduo(ampliconfreqs[,1:4], sample.names = site.f[1:2])

## plot amplicon read numbers of sample A  vs. amplicon read numbers of sample B,
## indicating amplicons with significant deviations in their occurence across samples
plotAmpliconduo.set(ampliconduoset, nrow = 3)

## calculate discordance between the two data sets of an ampliconduo
discordance <- discordance.delta(ampliconduoset)

## plot the odds ratio density of ampliconduo data
plotORdensity(ampliconduoset)

## apply filter criteria to remove/mark spurious amplicons
ampliconduoset.f <- filter.ampliconduo.set(ampliconduoset, min.freq = 1, q = 0.05)

## return indices of accepted amplicons, indices correspond to indices of the ampliconfreqs data, 
## that were used as input for the ampliconduo function
accep.reads <- accepted.amplicons(ampliconduoset.f)
# }

Run the code above in your browser using DataLab