ShortRead (version 1.30.0)

dustyScore: Summarize low-complexity sequences

Description

dustyScore identifies low-complexity sequences, in a manner inspired by the dust implementation in BLAST.

Usage

dustyScore(x, batchSize=NA, ...)

Arguments

x
A DNAStringSet object, or object derived from ShortRead, containing a collection of reads to be summarized.
batchSize
NA or an integer(1) vector indicating the maximum number of reads to be processed at any one time.
...
Additional arguments, not currently used.

Value

A vector of numeric scores, with length equal to the length of x.

Details

The following methods are defined:

dustyScore
signature(x = "DNAStringSet"): operating on an object derived from class DNAStringSet.

dustyScore
signature(x = "ShortRead"): operating on the sread of an object derived from class ShortRead.

The dust-like calculations used here are as implemented at https://stat.ethz.ch/pipermail/bioc-sig-sequencing/2009-February/000170.html. Scores range from 0 (all triplets unique) to the square of the width of the longest sequence (poly-A, -C, -G, or -T).

The batchSize argument can be used to reduce the memory requirements of the algorithm by processing the x argument in batches of the specified size. Smaller batch sizes use less memory, but are computationally less efficient.

References

Morgulis, Getz, Schaffer and Agarwala, 2006. WindowMasker: window-based masker for sequenced genomes, Bioinformatics 22: 134-141.

See Also

The WindowMasker supplement defining dust ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/windowmasker/windowmasker_suppl.pdf

Examples

Run this code
sp <- SolexaPath(system.file('extdata', package='ShortRead'))
rfq <- readFastq(analysisPath(sp), pattern="s_1_sequence.txt")
range(dustyScore(rfq))

Run the code above in your browser using DataCamp Workspace