Summarization: Summarization

Description

Summarizes read counts from multiple BAM/SAM files in parallel using feature annotations.

Usage

Summarization(
  NodesSum,
  Xsum,
  UploadPathSum,
  DownloadPathSum,
  annot.ext,
  isGTFAnnotationFile,
  GTF.featureType,
  GTF.attrType,
  useMetaFeatures,
  allowMultiOverlap,
  minOverlap,
  fracOverlap,
  fracOverlapFeature,
  largestOverlap,
  countMultiMappingReads,
  fraction,
  minMQS,
  primaryOnly,
  ignoreDup,
  strandSpecific,
  requireBothEndsMapped,
  checkFragLength,
  minFragLength,
  maxFragLength,
  countChimericFragments,
  autosort,
  nthreads,
  tmpDir,
  verbose
)

Value

Writes files to DownloadPathSum.

Arguments

NodesSum: Integer. Number of parallel R nodes (e.g., CPU cores) to spawn.
Xsum: Character vector. Filenames of BAM or SAM files to process.
UploadPathSum: Character. Directory containing the raw input files.
DownloadPathSum: Character. Directory into which all output files will be written.
annot.ext: Character. Path to an external annotation file (e.g., GTF/GFF).
isGTFAnnotationFile: Logical. Should annot.ext be treated as a GTF file?
GTF.featureType: Character. Feature type (e.g., "exon").
GTF.attrType: Character. GTF attribute (e.g., "gene_id").
useMetaFeatures: Logical. Collapse sub-features into meta-features before counting.
allowMultiOverlap: Logical. Allow reads overlapping multiple features to be counted.
minOverlap: Integer. Minimum number of overlapping bases to assign a read.
fracOverlap: Numeric. Minimum fraction of read that must overlap a feature.
fracOverlapFeature: Numeric. Minimum fraction of feature that must be covered by a read.
largestOverlap: Logical. When overlapping multiple features, assign based on largest overlap.
countMultiMappingReads: Logical. Count reads that map to multiple locations.
fraction: Logical. Distribute counts fractionally for multi-mapping reads.
minMQS: Integer. Minimum mapping quality score for reads to be counted.
primaryOnly: Logical. Count only the primary alignments of multi-mapping reads.
ignoreDup: Logical. Exclude PCR duplicates from counting.
strandSpecific: Integer. Strand-specific counting mode (0 = unstranded, 1 = stranded, 2 = reversely stranded).
requireBothEndsMapped: Logical. In paired-end mode, require both mates to map.
checkFragLength: Logical. Enforce fragment length checks on paired-end reads.
minFragLength: Numeric. Minimum fragment length to keep.
maxFragLength: Numeric. Maximum fragment length to keep.
countChimericFragments: Logical. Count discordant or chimeric read pairs.
autosort: Logical. Automatically sort input files if not already sorted.
nthreads: Integer. Number of threads per featureCounts call.
tmpDir: Character. Directory for temporary files (e.g., large intermediate files).
verbose: Logical. Print verbose messages during execution.

Details

This function run Rsubread::featureCounts() on each input file, capturing count statistics, annotation data, and per-sample summary logs. Results are written to the specified output directory.

A socket cluster of NodesSum workers is created.
Each worker invokes featureCounts() on one sample, using the annotation and counting parameters.
Outputs per sample:
- A text summary (*_summary.txt) capturing the console output.
- A CSV of count statistics (*_stat.csv).
- A CSV of feature annotations (*_annotation.csv).
- A tab-delimited count matrix saved under Counts/<sample>.tab.
The cluster is terminated once all samples complete.