Learn R Programming

h5vc (version 2.6.3)

helpers: helper functions

Description

These functions are helpers for dealing with tally data stored in HDF5 files.

Usage

formatGenomicPosition( x, unit = "Mb", divisor = 1000000, digits = 3,
nsmall = 1 )
encodeDNAString( ds )
defineBlocks( start, stop, blocksize )
getChromSize( tallyFile, group, dataset = "Reference", posDim = 1 )

Arguments

x
Numerical genomic position
unit
Which unit to convert the position to
divisor
divisor corresponding to the unit, i.e. 'Mb' -> 1e6, 'Kb' -> 1e3
digits
number of digits to keep
nsmall
nsmall parameter to the format function
ds
A DNAString object to be encoded in the HDF5 tally file specific encoding of nucleotides.
start
first position
stop
last position
blocksize
size of blocks
tallyFile
Tally file to work on
group
Group within tallyFile that we want to find the chromosome size for
dataset
Datset to extract chromosome size from - default is "Reference"
posDim
Which dimension of the dataset describes the genomic position

Value

  • formatGenomicPosition: formatted genomic position, e.g. "123.4 Mb"

    encodeDNAString: A numeric vector encoding the nucleotide sequence provided in ds according to the scheme c("A"=0,"C"=1,"G"=2,"T"=3). defineBlocks: A data.frame with the columns Start and End for blocks of size blocksize spanning the interval [start, stop]. getChromSize: Returns a numeric that is the size of the chromosome.

Details

formatGenomicPosition: Helps formatting genomic positions for annotating axes in mismatch plots etc.

encodeDNAString: This translates a DNAString object into a comaptible encoding that can be written to a HDF5 based tally file in the Reference dataset. Since the Python script for generating tallies only sets the Reference dataset in positions where mismatches exists updating the Reference dataset becomes necessary if one would like to perform analysis involving sequence context (GC-bias, mutationSpectrum, etc.) defineBlocks: This function returns a data.frame with the columns Start and End for blocks of size blocksize spanning the interval [start, stop]. getChromSize: This function is a helper to quickly look-up the chromosome size of a given group and tally file.

Examples

Run this code
formatGenomicPosition(123456789)
  library(Biostrings)
  lapply( DNAStringSet( c("simple"="ACGT", "movie"="GATTACA") ), encodeDNAString )
  getChromSize( system.file("extdata", "example.tally.hfs5", package="h5vcData"), "/ExampleStudy/16" )

Run the code above in your browser using DataLab