Learn R Programming

hiReadsProcessor (version 1.8.2)

primerIDAlignSeqs: Align a short pattern with PrimerID to variable length target sequences.

Description

Align a fixed length short pattern sequence containing primerID to variable length subject sequences using pairwiseAlignment. This function uses default of type="overlap", gapOpening=-1, and gapExtension=-1 to align the patterSeq against subjectSeqs. The search is broken up into as many pieces +1 as there are primerID and then compared against subjectSeqs. For example, patternSeq="AGCATCAGCANNNNNNNNNACGATCTACGCC" will launch two search jobs one per either side of Ns. For each search, qualityThreshold is used to filter out candidate alignments and the area in between is chosen to be the primerID. This strategy is benefical because of Indels introduced through homopolymer errors. Most likely the length of primerID(s) wont the same as you expected!

Usage

primerIDAlignSeqs(subjectSeqs = NULL, patternSeq = NULL,
  qualityThreshold1 = 0.75, qualityThreshold2 = 0.5, doAnchored = FALSE,
  doRC = TRUE, returnUnmatched = FALSE, returnRejected = FALSE,
  showStats = FALSE, ...)

Arguments

subjectSeqs
DNAStringSet object containing sequences to be searched for the pattern.
patternSeq
DNAString object or a sequence containing the query sequence to search with the primerID.
qualityThreshold1
percent of first part of patternSeq to match. Default is 0.75.
qualityThreshold2
percent of second part of patternSeq to match. Default is 0.50.
doAnchored
for primerID based patternSeq, use the base before and after primer ID in patternSeq as anchors?. Default is FALSE.
doRC
perform reverse complement search of the defined pattern. Default is TRUE.
returnUnmatched
return sequences if it had no or less than 5% match to the first part of patternSeq before the primerID. Default is FALSE.
returnRejected
return sequences if it only has a match to one side of patternSeq or primerID length does not match # of Ns +/-2 in the pattern. Default is FALSE.
showStats
toggle output of search statistics. Default is FALSE.
...
extra parameters for pairwiseAlignment

Value

    • A CompressedIRangesList of length two, where x[["hits"]] is hits covering the entire patternSeq, and x[["primerIDs"]] is the potential primerID region.
    • If returnUnmatched = T, then x[["Absent"]] is appended which includes reads not matching the first part of patternSeq.
    • If returnRejected=TRUE, then x[["Rejected"]] includes reads that only matched one part of patternSeq or places where no primerID was found in between two part of patternSeq, and x[["RejectedprimerIDs"]] includes primerIDs that didn't match the correct length.
    • If doAnchored=TRUE, then x[["unAnchoredprimerIDs"]] includes reads that didn't match the base before and after primer ID on patternSeq.

See Also

vpairwiseAlignSeqs, pairwiseAlignSeqs, doRCtest, blatSeqs, findAndRemoveVector

Examples

Run this code
subjectSeqs <- c("CCTGAATCCTGGCAATGTCATCATC", "ATCCTGGCAATGTCATCATCAATGG",
"ATCAGTTGTCAACGGCTAATACGCG", "ATCAATGGCGATTGCCGCGTCTGCA",
"CCGCGTCTGCAATGTGAGGGCCTAA", "GAAGGATGCCAGTTGAAGTTCACAC")
ids <- c("GGTTCTACGT", "AGGAGTATGA", "TGTCGGTATA", "GTTATAAAAC",
"AGGCTATATC", "ATGGTTTGTT")
subjectSeqs <- xscat(subjectSeqs, xscat("AAGCGGAGCCC",ids,"TTTTTTTTTTT"))
patternSeq <- "AAGCGGAGCCCNNNNNNNNNNTTTTTTTTTTT"
primerIDAlignSeqs(DNAStringSet(subjectSeqs), patternSeq, doAnchored = TRUE)

Run the code above in your browser using DataLab