Saturation: Saturation

Description

Generate a saturation curve plot showing gene detection versus sequencing depth.

Usage

Saturation(matrix, method, max_reads, palette)

Value

A "ggplot " object showing saturation (genes detected) versus sequencing depth for each sample.

Arguments

matrix: Numeric matrix or object coercible to matrix (genes × samples), e.g., log-counts or raw counts. Genes are rows; samples are columns.
method: Character. Estimation method: "division" or "sampling".
max_reads: Numeric. Maximum number of reads to include in the rarefaction (default: Inf).
palette: Character. Name of a discrete color palette from the "paletteer " package for curve colors.

Details

This function estimates how many genes are detected at increasing read depths using a rarefaction-based approach ( "estimate_saturation() from RNAseQC package https://github.com/BenaroyaResearch/RNAseQC.git"), and plots the saturation curves for each sample. It supports two estimation methods: “division” for a fast analytic approximation and “sampling” for more realistic approach.

Internally, "extract_counts() " (from countSubsetNorm) extracts a counts matrix from various input classes (matrix, DGEList, EList, ExpressionSet).
"estimate_saturation() " (from RNAseQC package https://github.com/BenaroyaResearch/RNAseQC.git) rarefies each library at multiple depths:

“division” divides counts by scale factors;
“sampling” performs repeated random sampling to simulate read down sampling.

The resulting data frame contains one row per sample per depth, with the number of detected genes ( "sat ") and, for sampling, its variance ( "sat.var ").
The function then plots gene saturation curves ( "sat" vs. "depth") colored by sample.

Extract counts matrix from different types of expression objects

Estimate saturation of genes based on rarefaction of reads