Learn R Programming

sequenza (version 2.1.1)

read.seqz: Read an seqz or acgt format file

Description

Efficiently reads an seqz or acgt file into R.

Usage

read.seqz(file, nrows = -1, fast = FALSE, gz = TRUE, header = TRUE,
            colClasses = c("character", "integer", "character", "integer",
            "integer", "numeric", "numeric", "numeric", "character",
            "numeric", "numeric", "character", "character", "character"),
            chr.name = NULL, n.lines = NULL, ...)

read.acgt(file, colClasses = c("character", "integer", "character", "integer", "integer", "integer", "integer", "integer", "character"), ...)

Arguments

file
file name
nrows
number of rows to read from the file. Default is -1 (all rows).
fast
logical. If TRUE the file will be pre-parsed to count the number of rows; on some systems this can speed up the file reading.
gz
logical. If TRUE (the default) the function expects a gzipped file.
header
logical, indicating whether the file contains the names of the variables as its first line.
colClasses
character. A vector of classes to be assumed for the columns. By default the acgt and seqz format is expected.
chr.name
if specified, only the selected chromosome will be extracted instead of the entire file.
n.lines
vector of length 2 specifying the first and last line to read from the file. If specified, only the selected portion of the file will be used. Requires the sed UNIX utility.
...
any arguments accepted by read.delim. For read.acgt, also any arguments accepted by read.seqz.

Details

read.seqz is a function that allows to efficiently access a file by chromosome or by number of line. The specific content of a seqz file or an acgt is explained in the value section.

See Also

read.delim.

Examples

Run this code
data.file <-  system.file("data", "example.seqz.txt.gz", package = "sequenza")

## read chromosome 1 from an seqz file.
seqz.data <- read.seqz(data.file, chr.name = 1)

## Fast access to chromosome X using the file metrics
gc.stats <- gc.sample.stats(data.file)
chrX <- gc.stats$file.metrics[gc.stats$file.metrics$chr == "X", ]
seqz.data <- read.seqz(data.file, n.lines = c(chrX$start, chrX$end))

## Compare the running time of the two different methods.
system.time(read.seqz(data.file, n.lines = c(chrX$start, chrX$end)))
system.time(seqz.data <- read.seqz(data.file,chr.name="X"))

Run the code above in your browser using DataLab