phyloseq_to_deseq2
for a recommended
alternative to rarefying directly supported in the
phyloseq package, as well as
the
supplemental materials for the PLoS-CB article and
the
phyloseq extensions repository on GitHub. Nevertheless,
for comparison and demonstration, the rarefying procedure
is implemented here in good faith and with options we
hope are useful. This function uses the standard R
sample
function to resample from the
abundance values in the otu_table
component
of the first argument, physeq
. Often one of the
major goals of this procedure is to achieve parity in
total number of counts between samples, as an alternative
to other formal normalization procedures, which is why a
single value for the sample.size
is expected. This
kind of resampling can be performed with and without
replacement, with replacement being the more
computationally-efficient, default setting. See the
replace
parameter documentation for more details.
We recommended that you explicitly select a random number
generator seed before invoking this function, or,
alternatively, that you explicitly provide a single
positive integer argument as rngseed
.
rarefy_even_depth(physeq, sample.size = min(sample_sums(physeq)), rngseed = FALSE, replace = TRUE, trimOTUs = TRUE, verbose = TRUE)
phyloseq-class
object that you want to trim/filter.sample_sums
on the output.set.seed
, which is used to fix a seed
for reproducibly random number generation (in this case,
reproducibly random subsampling). The default value is
711
. If set to FALSE
, then no fiddling with
the RNG seed is performed, and it is up to the user to
appropriately call set.seed
beforehand to
achieve reproducible results.TRUE
) or without replacement
(FALSE
). The default is with replacement
(replace=TRUE
). Two implications to consider are
that (1) sampling with replacement is faster and more
memory efficient as currently implemented; and (2),
sampling with replacement means that there is a chance
that the number of reads for a given OTU in a given
sample could be larger than the original count value, as
opposed to sampling without replacement where the
original count value is the maximum possible. Prior to
phyloseq package version number 1.5.20
, this
parameter did not exist and sampling with replacement was
the only random subsampling implemented in the
rarefy_even_depth
function. Note that this default
behavior was selected for computational efficiency, but
differs from analogous functions in related packages
(e.g. subsampling in QIIME).TRUE
. If TRUE
, extra non-warning, non-error
messages are printed to standard out, describing steps in
the rarefying process, the OTUs and samples removed, etc.
This can be useful the first few times the function is
executed, but can be set to FALSE
as-needed once
behavior has been verified as expected.phyloseq
. Only the
otu_table
component is modified.
rarefy
,
that has also been used recently to describe this process
and, to our knowledge, not previously used in ecology. Make sure to use set.seed
for
exactly-reproducible results of the random subsampling.
sample
# Test with esophagus dataset
data("esophagus")
esorepT = rarefy_even_depth(esophagus, replace=TRUE)
esorepF = rarefy_even_depth(esophagus, replace=FALSE)
sample_sums(esophagus)
sample_sums(esorepT)
sample_sums(esorepF)
## NRun Manually: Too slow!
# data("GlobalPatterns")
# GPrepT = rarefy_even_depth(GlobalPatterns, 1E5, replace=TRUE)
## Actually just this one is slow
# system.time(GPrepF <- rarefy_even_depth(GlobalPatterns, 1E5, replace=FALSE))
Run the code above in your browser using DataLab