Learn R Programming

inDAGO (version 1.0.0)

Summarization: Summarization

Description

Summarizes read counts from multiple BAM/SAM files in parallel using feature annotations.

Usage

Summarization(
  NodesSum,
  Xsum,
  UploadPathSum,
  DownloadPathSum,
  annot.ext,
  isGTFAnnotationFile,
  GTF.featureType,
  GTF.attrType,
  useMetaFeatures,
  allowMultiOverlap,
  minOverlap,
  fracOverlap,
  fracOverlapFeature,
  largestOverlap,
  countMultiMappingReads,
  fraction,
  minMQS,
  primaryOnly,
  ignoreDup,
  strandSpecific,
  requireBothEndsMapped,
  checkFragLength,
  minFragLength,
  maxFragLength,
  countChimericFragments,
  autosort,
  nthreads,
  tmpDir,
  verbose
)

Value

Writes files to DownloadPathSum.

Arguments

NodesSum

Integer. Number of parallel R nodes (e.g., CPU cores) to spawn.

Xsum

Character vector. Filenames of BAM or SAM files to process.

UploadPathSum

Character. Directory containing the raw input files.

DownloadPathSum

Character. Directory into which all output files will be written.

annot.ext

Character. Path to an external annotation file (e.g., GTF/GFF).

isGTFAnnotationFile

Logical. Should annot.ext be treated as a GTF file?

GTF.featureType

Character. Feature type (e.g., "exon").

GTF.attrType

Character. GTF attribute (e.g., "gene_id").

useMetaFeatures

Logical. Collapse sub-features into meta-features before counting.

allowMultiOverlap

Logical. Allow reads overlapping multiple features to be counted.

minOverlap

Integer. Minimum number of overlapping bases to assign a read.

fracOverlap

Numeric. Minimum fraction of read that must overlap a feature.

fracOverlapFeature

Numeric. Minimum fraction of feature that must be covered by a read.

largestOverlap

Logical. When overlapping multiple features, assign based on largest overlap.

countMultiMappingReads

Logical. Count reads that map to multiple locations.

fraction

Logical. Distribute counts fractionally for multi-mapping reads.

minMQS

Integer. Minimum mapping quality score for reads to be counted.

primaryOnly

Logical. Count only the primary alignments of multi-mapping reads.

ignoreDup

Logical. Exclude PCR duplicates from counting.

strandSpecific

Integer. Strand-specific counting mode (0 = unstranded, 1 = stranded, 2 = reversely stranded).

requireBothEndsMapped

Logical. In paired-end mode, require both mates to map.

checkFragLength

Logical. Enforce fragment length checks on paired-end reads.

minFragLength

Numeric. Minimum fragment length to keep.

maxFragLength

Numeric. Maximum fragment length to keep.

countChimericFragments

Logical. Count discordant or chimeric read pairs.

autosort

Logical. Automatically sort input files if not already sorted.

nthreads

Integer. Number of threads per featureCounts call.

tmpDir

Character. Directory for temporary files (e.g., large intermediate files).

verbose

Logical. Print verbose messages during execution.

Details

This function run Rsubread::featureCounts() on each input file, capturing count statistics, annotation data, and per-sample summary logs. Results are written to the specified output directory.

  1. A socket cluster of NodesSum workers is created.

  2. Each worker invokes featureCounts() on one sample, using the annotation and counting parameters.

  3. Outputs per sample:

    • A text summary (*_summary.txt) capturing the console output.

    • A CSV of count statistics (*_stat.csv).

    • A CSV of feature annotations (*_annotation.csv).

    • A tab-delimited count matrix saved under Counts/<sample>.tab.

  4. The cluster is terminated once all samples complete.