Learn R Programming

Rtwobitlib (version 0.3.8)

twobit_seqstats: Extract sequence lengths and letter counts from a .2bit file

Description

Extract the lengths and letter counts of the DNA sequences stored in a .2bit file.

Usage

twobit_seqstats(filepath)

twobit_seqlengths(filepath)

Value

For twobit_seqstats(): An integer matrix with one row per sequence in the .2bit file and 6 columns. The rownames on the matrix are the sequence names and the colnames are: seqlengths, A, C,

G, T, N. Columns A, C, G, T, and N contain the letter count for each sequence.

For twobit_seqlengths(): A named integer vector where the names are the sequence names and the values the corresponding lengths.

Arguments

filepath

A single string (character vector of length 1) containing a path to a .2bit file.

Details

twobit_seqlengths(filepath) is a shortcut for twobit_seqstats(filepath)[ , "seqlengths"] that is also a much more efficient way to get the sequence lengths as it does not need to load the sequence data in memory.

References

A quick overview of the 2bit format: https://genome.ucsc.edu/FAQ/FAQformat.html#format7

See Also

twobit_read and twobit_write to read/write a character vector representing DNA sequences from/to a file in 2bit format.

Examples

Run this code
filepath <- system.file(package="Rtwobitlib", "extdata", "sacCer2.2bit")

twobit_seqstats(filepath)

twobit_seqlengths(filepath)

## Sanity checks:
sacCer2_seqstats <- twobit_seqstats(filepath)
stopifnot(
  identical(sacCer2_seqstats[ , 1], twobit_seqlengths(filepath)),
  all.equal(rowSums(sacCer2_seqstats[ , -1]), sacCer2_seqstats[ , 1])
)

Run the code above in your browser using DataLab