suggest_inflate_depths: Suggest Inflation Depths

Description

Estimates the optimal sequencing depth for each sample in a matrix by leveraging the global abundance distribution structure.

Usage

suggest_inflate_depths(biom, adjust = 1.5)

Value

A named integer vector of recommended depths for each sample.

Arguments

biom: An rbiom object, or any value accepted by as_rbiom().
adjust: Numeric. Bandwidth adjustment for the kernel density estimation. Default: 1.5.

The Singleton Peak Heuristic

When depth = NULL, biom_inflate() calls this function to estimate the original sequencing depth for each sample. The underlying assumption is that in typical microbiome datasets, the most frequent count value (the mode of the abundance distribution) is 1 (a singleton).

The algorithm works as follows:

Log-Transformation: Non-zero relative abundances are log10-transformed.
Global Consensus: To overcome sparsity in individual samples, distributions are centered by their medians and aggregated across all samples.
Peak Detection: Kernel Density Estimation (KDE) is used to identify the peak (mode) of this aggregated distribution.
Scaling: A scaling factor is calculated for each sample that shifts this peak to correspond to an integer count of 1.

This approach effectively "shoehorns" relative abundance data into integer formats required by diversity metrics (like rarefaction or Chao1) by maximizing the number of singletons in the resulting matrix.

Examples

Run this code

    library(rbiom)
    
    depths <- suggest_inflate_depths(hmp50)
    head(depths)