Generate a saturation curve plot showing gene detection versus sequencing depth.
Saturation(matrix, method, max_reads, palette)
A "ggplot " object showing saturation (genes detected) versus sequencing depth for each sample.
Numeric matrix or object coercible to matrix (genes × samples), e.g., log-counts or raw counts. Genes are rows; samples are columns.
Character. Estimation method: "division" or "sampling".
Numeric. Maximum number of reads to include in the rarefaction (default: Inf).
Character. Name of a discrete color palette from the "paletteer " package for curve colors.
This function estimates how many genes are detected at increasing read depths using a rarefaction-based approach ( "estimate_saturation() from RNAseQC package https://github.com/BenaroyaResearch/RNAseQC.git"), and plots the saturation curves for each sample. It supports two estimation methods: “division” for a fast analytic approximation and “sampling” for more realistic approach.
Internally, "extract_counts() " (from countSubsetNorm) extracts a counts matrix from various input classes (matrix, DGEList, EList, ExpressionSet).
"estimate_saturation() " (from RNAseQC package https://github.com/BenaroyaResearch/RNAseQC.git) rarefies each library at multiple depths:
“division” divides counts by scale factors;
“sampling” performs repeated random sampling to simulate read down sampling.
The resulting data frame contains one row per sample per depth, with the number of detected genes ( "sat ") and, for sampling, its variance ( "sat.var ").
The function then plots gene saturation curves ( "sat" vs. "depth") colored by sample.
Extract counts matrix from different types of expression objects
Estimate saturation of genes based on rarefaction of reads