ShortRead (version 1.30.0)

FastqFile-class: Sampling and streaming records from fastq files

Description

FastqFile represents a path and connection to a fastq file. FastqFileList is a list of such connections.

FastqSampler draws a subsample from a fastq file. yield is the method used to extract the sample from the FastqSampler instance; a short illustration is in the example below. FastqSamplerList is a list of FastqSampler elements.

FastqStreamer draws successive subsets from a fastq file, a short illustration is in the example below. FastqStreamerList is a list of FastqStreamer elements.

Usage

## FastqFile and FastqFileList FastqFile(con, ...) FastqFileList(..., class="FastqFile") "open"(con, ...) "close"(con, ...) "readFastq"(dirPath, pattern=character(), ...)
## FastqSampler and FastqStreamer FastqSampler(con, n=1e6, readerBlockSize=1e8, verbose=FALSE, ordered = FALSE) FastqSamplerList(..., n=1e6, readerBlockSize=1e8, verbose=FALSE, ordered = FALSE) FastqStreamer(con, n, readerBlockSize=1e8, verbose=FALSE) FastqStreamerList(..., n, readerBlockSize=1e8, verbose=FALSE) yield(x, ...)

Arguments

con, dirPath
A character string naming a connection, or (for con) an R connection (e.g., file, gzfile).
n
For FastqSampler, the size of the sample (number of records) to be drawn. For FastqStreamer a numeric(1) (set to 1e6 when n is missing) providing the number of successive records to be returned on each yield, or an IRanges-class delimiting the (1-based) indicies of records returned by each yield; entries in n must have non-zero width and must not overlap.
readerBlockSize
The number of bytes or characters to be read at one time; smaller readerBlockSize reduces memory requirements but is less efficient.
verbose
Display progress.
ordered
logical(1) indicating whether sampled reads should be returned in the same order as they were encountered in the file.
x
An instance from the FastqSampler or FastqStreamer class.
...
Additional arguments. For FastqFileList, FastqSamplerList, or FastqStreamerList, this can either be a single character vector of paths to fastq files, or several instances of the corresponding FastqFile, FastqSampler, or FastqStreamer objects.
pattern
Ignored.
class
For developer use, to specify the underlying class contained in the FastqFileList.

Objects from the class

Available classes include:
FastqFile
A file path and connection to a fastq file.
FastqFileList
A list of FastqFile instances.
FastqSampler
Uniformly sample records from a fastq file.
FastqStreamer
Iterate over a fastq file, returning successive parts of the file.

Methods

The following methods are available to users:
readFastq,FastqFile-method:
see also ?readFastq.
writeFastq,ShortReadQ,FastqFile-method:
see also ?writeFastq, ?"writeFastq,ShortReadQ,FastqFile-method".
yield:
Draw a single sample from the instance. Operationally this requires that the underlying data (e.g., file) represented by the Sampler instance be visited; this may be time consuming.

See Also

readFastq, writeFastq, yield.

Examples

Run this code
sp <- SolexaPath(system.file('extdata', package='ShortRead'))
fl <- file.path(analysisPath(sp), "s_1_sequence.txt")

f <- FastqFile(fl)
rfq <- readFastq(f)
close(f)

f <- FastqSampler(fl, 50)
yield(f)    # sample of size n=50
yield(f)    # independent sample of size 50
close(f)

## Return sample as ordered in original file
f <- FastqSampler(fl, 50, ordered=TRUE)
yield(f)
close(f)

f <- FastqStreamer(fl, 50)
yield(f)    # records 1 to 50
yield(f)    # records 51 to 100
close(f)

## iterating over an entire file
f <- FastqStreamer(fl, 50)
while (length(fq <- yield(f))) {
    ## do work here
    print(length(fq))
}
close(f)

## iterating over IRanges
rng <- IRanges(c(50, 100, 200), width=10:8)
f <- FastqStreamer(fl, rng)
while (length(fq <- yield(f))) {
    print(length(fq))
}
close(f)

## Internal fields, methods, and help; for developers
ShortRead:::.FastqSampler_g$methods()
ShortRead:::.FastqSampler_g$fields()
ShortRead:::.FastqSampler_g$help("yield")

Run the code above in your browser using DataLab