Learn R Programming

inDAGO (version 1.0.0)

Filtering: Filtering

Description

Filter paired-end FASTQ files in parallel based on quality and adapter trimming criteria.

Usage

Filtering(
  Nodes,
  X,
  UploadPath,
  DownloadPath,
  qualityType,
  minLen,
  trim,
  trimValue,
  n,
  Adapters,
  Lpattern,
  Rpattern,
  max.Lmismatch,
  max.Rmismatch,
  kW,
  left,
  right,
  halfwidthAnalysis,
  halfwidth,
  compress
)

Value

Filtered FASTQ files written to "DownloadPath"; one log file per sample.

Arguments

Nodes

Integer. Number of parallel processing nodes (e.g., CPU cores).

X

List of character vectors. Each element is a character vector of paired file names (e.g., c("sample_1.fq", "sample_2.fq")).

UploadPath

Character. Path to directory containing raw FASTQ files.

DownloadPath

Character. Path to directory where filtered files will be saved.

qualityType

Character. Type of quality score encoding, e.g., "Sanger" or "Illumina".

minLen

Integer. Minimum length of reads to retain after filtering.

trim

Logical. Whether to perform quality-based trimming of reads.

trimValue

Integer. Minimum Phred score threshold for trimming.

n

Integer. Number of reads to stream per chunk (default typically set to 1e6).

Adapters

Logical. Whether to remove adapters from reads.

Lpattern

Character. Adapter sequence to remove from the 5' end (left).

Rpattern

Character. Adapter sequence to remove from the 3' end (right).

max.Lmismatch

Integer. Maximum mismatches allowed for the left adapter.

max.Rmismatch

Integer. Maximum mismatches allowed for the right adapter.

kW

Integer. Minimum number of low-quality scores in a window to trigger trimming (sliding window analysis).

left

Logical. Whether to allow trimming from the left end.

right

Logical. Whether to allow trimming from the right end.

halfwidthAnalysis

Logical. Whether to perform sliding window-based trimming.

halfwidth

Integer. Half-width of the sliding window.

compress

Logical. Whether to compress the output FASTQ files.

Details

This function processes raw paired-end FASTQ files to remove low-quality bases, trim adapters, and filter out short reads. It supports quality-based end trimming, sliding window trimming, and adapter removal. The processing is done in parallel across multiple nodes to enhance performance when working with large datasets.

  • Paired FASTQ files must be named consistently, distinguished by "_1" and "_2" for forward and reverse reads.

  • This function uses the "ShortRead" and "Biostrings" packages for FASTQ processing and quality filtering.

  • Filtered files in FASTQ format".

  • Log files containing read counts before and after filtering are written per sample.