dustyScore
identifies low-complexity sequences, in a manner
inspired by the dust
implementation in BLAST
.
dustyScore(x, batchSize=NA, ...)
DNAStringSet
object, or object derived from
ShortRead
, containing a collection of reads to be
summarized.NA
or an integer(1)
vector indicating
the maximum number of reads to be processed at any one time.x
.The following methods are defined:
signature(x = "DNAStringSet")
: operating on
an object derived from class DNAStringSet
.
signature(x = "ShortRead")
: operating on
the sread
of an object derived from class
ShortRead
.
The dust-like calculations used here are as implemented at https://stat.ethz.ch/pipermail/bioc-sig-sequencing/2009-February/000170.html. Scores range from 0 (all triplets unique) to the square of the width of the longest sequence (poly-A, -C, -G, or -T).
The batchSize
argument can be used to reduce the memory
requirements of the algorithm by processing the x
argument in
batches of the specified size. Smaller batch sizes use less memory,
but are computationally less efficient.
Morgulis, Getz, Schaffer and Agarwala, 2006. WindowMasker: window-based masker for sequenced genomes, Bioinformatics 22: 134-141.
The WindowMasker supplement defining dust
ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/windowmasker/windowmasker_suppl.pdf
sp <- SolexaPath(system.file('extdata', package='ShortRead'))
rfq <- readFastq(analysisPath(sp), pattern="s_1_sequence.txt")
range(dustyScore(rfq))
Run the code above in your browser using DataLab