Learn R Programming

Basic4Cseq (version 1.8.0)

checkRestrictionEnzymeSequence: Remove invalid 4C-seq reads from a SAM file

Description

Basic4Cseq offers filter functions for invalid 4C-seq reads. This function removes 4C-seq reads from a provided Sequence Alignment/Map (SAM) file that show mismatches in the restriction enzyme sequence.

Usage

checkRestrictionEnzymeSequence(firstCutter, inputFileName, outputFileName = "output.sam", keepOnlyUniqueReads = TRUE, writeStatistics = TRUE)

Arguments

firstCutter
First restriction enzyme sequence of the 4C-seq experiment
inputFileName
Name of the input SAM file that contains aligned reads for the 4C-seq experiment
outputFileName
Name of the output SAM file that is created to store the filtered 4C-seq reads
keepOnlyUniqueReads
If TRUE, delete non-unique reads. Information in the SAM flag field is used to determine whether a read is unique or not.
writeStatistics
If TRUE, write statistics (e.g. the number of unique reads) to a text file

Value

Details

Valid 4C-seq reads start at a primary restriction site and continue with its downstream sequence, so any mismatch in the restriction enzyme sequence of a read is an indicator for a mismatch. The mapping information of the restriction enzyme sequence bases of a read (if present) can be used for filtering purposes. checkRestrictionEnzymeSequence tests the first bases of a read (depending on the length of the first restriction enzyme either 4 or 6 bp long) for mismatches. Reads with mismatches in the restriction enzyme sequence are deleted, the filtered data is then written to a new SAM file. The function does not yet differentiate between blind and nonblind fragments, but removes potential misalignments that may overlap with valid fragment ends and distort the true 4C-seq signal.

Examples

Run this code
  if(interactive()) {
    file <- system.file("extdata", "fetalLiverCutter.sam", package="Basic4Cseq")
    checkRestrictionEnzymeSequence("aagctt", file)
  }

Run the code above in your browser using DataLab