primerIDAlignSeqs: Align a short pattern with PrimerID to variable length target sequences.

Description

Align a fixed length short pattern sequence containing primerID to variable length subject sequences using pairwiseAlignment. This function uses default of type="overlap", gapOpening=-1, and gapExtension=-1 to align the patterSeq against subjectSeqs. The search is broken up into as many pieces +1 as there are primerID and then compared against subjectSeqs. For example, patternSeq="AGCATCAGCANNNNNNNNNACGATCTACGCC" will launch two search jobs one per either side of Ns. For each search, qualityThreshold is used to filter out candidate alignments and the area in between is chosen to be the primerID. This strategy is benefical because of Indels introduced through homopolymer errors. Most likely the length of primerID(s) wont the same as you expected!

Usage

primerIDAlignSeqs(subjectSeqs = NULL, patternSeq = NULL,
  qualityThreshold1 = 0.75, qualityThreshold2 = 0.5, doAnchored = FALSE,
  doRC = TRUE, returnUnmatched = FALSE, returnRejected = FALSE,
  showStats = FALSE, ...)

Arguments

subjectSeqs

DNAStringSet object containing sequences to be searched for the pattern.

patternSeq

DNAString object or a sequence containing the query sequence to search with the primerID.

qualityThreshold1

percent of first part of patternSeq to match. Default is 0.75.

qualityThreshold2

percent of second part of patternSeq to match. Default is 0.50.

doAnchored

for primerID based patternSeq, use the base before and after primer ID in patternSeq as anchors?. Default is FALSE.

doRC

perform reverse complement search of the defined pattern. Default is TRUE.

returnUnmatched

return sequences if it had no or less than 5% match to the first part of patternSeq before the primerID. Default is FALSE.

returnRejected

return sequences if it only has a match to one side of patternSeq or primerID length does not match # of Ns +/-2 in the pattern. Default is FALSE.

showStats

toggle output of search statistics. Default is FALSE.

...

extra parameters for pairwiseAlignment

Value

- A CompressedIRangesList of length two, where x[["hits"]] is hits covering the entire patternSeq, and x[["primerIDs"]] is the potential primerID region.
- If returnUnmatched = T, then x[["Absent"]] is appended which includes reads not matching the first part of patternSeq.
- If returnRejected=TRUE, then x[["Rejected"]] includes reads that only matched one part of patternSeq or places where no primerID was found in between two part of patternSeq, and x[["RejectedprimerIDs"]] includes primerIDs that didn't match the correct length.
- If doAnchored=TRUE, then x[["unAnchoredprimerIDs"]] includes reads that didn't match the base before and after primer ID on patternSeq.

Examples

Run this code

subjectSeqs <- c("CCTGAATCCTGGCAATGTCATCATC", "ATCCTGGCAATGTCATCATCAATGG",
"ATCAGTTGTCAACGGCTAATACGCG", "ATCAATGGCGATTGCCGCGTCTGCA",
"CCGCGTCTGCAATGTGAGGGCCTAA", "GAAGGATGCCAGTTGAAGTTCACAC")
ids <- c("GGTTCTACGT", "AGGAGTATGA", "TGTCGGTATA", "GTTATAAAAC",
"AGGCTATATC", "ATGGTTTGTT")
subjectSeqs <- xscat(subjectSeqs, xscat("AAGCGGAGCCC",ids,"TTTTTTTTTTT"))
patternSeq <- "AAGCGGAGCCCNNNNNNNNNNTTTTTTTTTTT"
primerIDAlignSeqs(DNAStringSet(subjectSeqs), patternSeq, doAnchored = TRUE)

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples