checkRestrictionEnzymeSequence: Remove invalid 4C-seq reads from a SAM file
Description
Basic4Cseq offers filter functions for invalid 4C-seq reads. This function removes 4C-seq reads from a provided Sequence Alignment/Map (SAM) file that show mismatches in the restriction enzyme sequence.
First restriction enzyme sequence of the 4C-seq experiment
inputFileName
Name of the input SAM file that contains aligned reads for the 4C-seq experiment
outputFileName
Name of the output SAM file that is created to store the filtered 4C-seq reads
keepOnlyUniqueReads
If TRUE, delete non-unique reads. Information in the SAM flag field is used to determine whether a read is unique or not.
writeStatistics
If TRUE, write statistics (e.g. the number of unique reads) to a text file
Value
Details
Valid 4C-seq reads start at a primary restriction site and continue with its downstream sequence, so any mismatch in the restriction enzyme sequence of a read is an indicator for a mismatch. The mapping information of the restriction enzyme sequence bases of a read (if present) can be used for filtering purposes. checkRestrictionEnzymeSequence tests the first bases of a read (depending on the length of the first restriction enzyme either 4 or 6 bp long) for mismatches. Reads with mismatches in the restriction enzyme sequence are deleted, the filtered data is then written to a new SAM file. The function does not yet differentiate between blind and nonblind fragments, but removes potential misalignments that may overlap with valid fragment ends and distort the true 4C-seq signal.