Learn R Programming

inDAGO (version 1.0.0)

Saturation: Saturation

Description

Generate a saturation curve plot showing gene detection versus sequencing depth.

Usage

Saturation(matrix, method, max_reads, palette)

Value

A "ggplot " object showing saturation (genes detected) versus sequencing depth for each sample.

Arguments

matrix

Numeric matrix or object coercible to matrix (genes × samples), e.g., log-counts or raw counts. Genes are rows; samples are columns.

method

Character. Estimation method: "division" or "sampling".

max_reads

Numeric. Maximum number of reads to include in the rarefaction (default: Inf).

palette

Character. Name of a discrete color palette from the "paletteer " package for curve colors.

Details

This function estimates how many genes are detected at increasing read depths using a rarefaction-based approach ( "estimate_saturation() from RNAseQC package https://github.com/BenaroyaResearch/RNAseQC.git"), and plots the saturation curves for each sample. It supports two estimation methods: “division” for a fast analytic approximation and “sampling” for more realistic approach.

  1. Internally, "extract_counts() " (from countSubsetNorm) extracts a counts matrix from various input classes (matrix, DGEList, EList, ExpressionSet).

  2. "estimate_saturation() " (from RNAseQC package https://github.com/BenaroyaResearch/RNAseQC.git) rarefies each library at multiple depths:

  • “division” divides counts by scale factors;

  • “sampling” performs repeated random sampling to simulate read down sampling.

  1. The resulting data frame contains one row per sample per depth, with the number of detected genes ( "sat ") and, for sampling, its variance ( "sat.var ").

  2. The function then plots gene saturation curves ( "sat" vs. "depth") colored by sample.

Extract counts matrix from different types of expression objects

Estimate saturation of genes based on rarefaction of reads