prepareTallyFile: prepareTallyFile

Description

Functions for preparing an HDF5 file for storing tally data and / or modifying an existing file

Usage

prepareTallyFile( filename, study, chrom, chromlength, nsamples, maxsamples = nsamples, chunkSize = 50000, sampleChunkSize = nsamples, compressionLevel = 9, referenceFillValue = 5 )
resizeCohort( filename, study, chrom, newNumberOfSamples, dimmap = .sampleDimMap, force = FALSE )

Arguments

filename

Filename of the HDF5 file that should store the tallies

study

Study identifier which will be used in structuring the file

chrom

Chromosome for which the structure should be generated

chromlength

The length of the chromosom, this will be the size of genomic position dimension

nsamples

Number of samples that will be stored in the file

maxsamples

Maximum Number of samples that can be stored in the file, this relatesto the maxdim property of HDF5 datasets, which is used to specify possible re-sizing of datasets after creation - see http:://www.hdfgroup.org for details

chunkSize

The size of the chunks used in HDF5 storage, this is specified along the genomic position dimension, by default chunks will always be all data from all samples with the given width along the genomic position dimension

compressionLevel

Compression level to use in the HDF5 file, defaults to 9 (highest), use lower numbers to improve access time at the cost of disk space usage

sampleChunkSize

Size of the HDF5 chunks along the sample dimension, the dafault value is the whole dataset, i.e. all samples. For larger datasets where the typical use-case is to extract only data corresponding to a specific sample and genomic position, smaller values of sampleChunkSize should be used.

referenceFillValue

Default value to be used for the Reference dataset, this is set to 5 by default, which corresponds to the nucleotide N

newNumberOfSamples

New cohort size, this must be smaller than the value of maxsamples that was provided when the file was created

dimmap

A list mapping dataset names to the dimension in which the samples are stored (e.g. "Counts" -> 2)

force

Boolean parameter that controls whether a shrinking operation (i.e. newNumberOfSamples is smaller than the current number of samples) should be performed or throw an error. Shrinking will result in data loss.

Value

Returns TRUE on success

Details

prepareTallyFile prepares (and creates if neccessary) an HDF5 file for storing the datasets that are associated with a tally. It creates the required groups and datasets (filled with 0's). resizeCohort{Resizes the datasets to a new number of samples, this is limited by the value of maxsamples that was provided in the initial call to prepareTallyFile}

Examples

Run this code

prepareTallyFile( file.path( tempdir(), "test.tally.hfs5" ), "SomeStudy", "ChromosomeB", 1e6, 20 )

Run the code above in your browser using DataLab