dustyScore: Summarize low-complexity sequences

Description

dustyScore identifies low-complexity sequences, in a manner inspired by the dust implementation in BLAST.

Usage

dustyScore(x, batchSize=NA, ...)

Arguments

A DNAStringSet object, or object derived from ShortRead, containing a collection of reads to be summarized.

batchSize

NA or an integer(1) vector indicating the maximum number of reads to be processed at any one time.

...

Additional arguments, not currently used.

Value

A vector of numeric scores, with length equal to the length of x.

Details

The following methods are defined:

dustyScore

signature(x = "DNAStringSet"): operating on an object derived from class DNAStringSet.

dustyScore

signature(x = "ShortRead"): operating on the sread of an object derived from class ShortRead.

The dust-like calculations used here are as implemented at https://stat.ethz.ch/pipermail/bioc-sig-sequencing/2009-February/000170.html. Scores range from 0 (all triplets unique) to the square of the width of the longest sequence (poly-A, -C, -G, or -T).

The batchSize argument can be used to reduce the memory requirements of the algorithm by processing the x argument in batches of the specified size. Smaller batch sizes use less memory, but are computationally less efficient.

References

Morgulis, Getz, Schaffer and Agarwala, 2006. WindowMasker: window-based masker for sequenced genomes, Bioinformatics 22: 134-141.

Examples

Run this code

sp <- SolexaPath(system.file('extdata', package='ShortRead'))
rfq <- readFastq(analysisPath(sp), pattern="s_1_sequence.txt")
range(dustyScore(rfq))

Run the code above in your browser using DataLab