Filtering: Filtering

Description

Filter paired-end FASTQ files in parallel based on quality and adapter trimming criteria.

Usage

Filtering(
  Nodes,
  X,
  UploadPath,
  DownloadPath,
  qualityType,
  minLen,
  trim,
  trimValue,
  n,
  Adapters,
  Lpattern,
  Rpattern,
  max.Lmismatch,
  max.Rmismatch,
  kW,
  left,
  right,
  halfwidthAnalysis,
  halfwidth,
  compress
)

Value

Filtered FASTQ files written to "DownloadPath"; one log file per sample.

Arguments

Nodes: Integer. Number of parallel processing nodes (e.g., CPU cores).
X: List of character vectors. Each element is a character vector of paired file names (e.g., c("sample_1.fq", "sample_2.fq")).
UploadPath: Character. Path to directory containing raw FASTQ files.
DownloadPath: Character. Path to directory where filtered files will be saved.
qualityType: Character. Type of quality score encoding, e.g., "Sanger" or "Illumina".
minLen: Integer. Minimum length of reads to retain after filtering.
trim: Logical. Whether to perform quality-based trimming of reads.
trimValue: Integer. Minimum Phred score threshold for trimming.
n: Integer. Number of reads to stream per chunk (default typically set to 1e6).
Adapters: Logical. Whether to remove adapters from reads.
Lpattern: Character. Adapter sequence to remove from the 5' end (left).
Rpattern: Character. Adapter sequence to remove from the 3' end (right).
max.Lmismatch: Integer. Maximum mismatches allowed for the left adapter.
max.Rmismatch: Integer. Maximum mismatches allowed for the right adapter.
kW: Integer. Minimum number of low-quality scores in a window to trigger trimming (sliding window analysis).
left: Logical. Whether to allow trimming from the left end.
right: Logical. Whether to allow trimming from the right end.
halfwidthAnalysis: Logical. Whether to perform sliding window-based trimming.
halfwidth: Integer. Half-width of the sliding window.
compress: Logical. Whether to compress the output FASTQ files.

Details

This function processes raw paired-end FASTQ files to remove low-quality bases, trim adapters, and filter out short reads. It supports quality-based end trimming, sliding window trimming, and adapter removal. The processing is done in parallel across multiple nodes to enhance performance when working with large datasets.

Paired FASTQ files must be named consistently, distinguished by "_1" and "_2" for forward and reverse reads.
This function uses the "ShortRead" and "Biostrings" packages for FASTQ processing and quality filtering.
Filtered files in FASTQ format".
Log files containing read counts before and after filtering are written per sample.