Learn R Programming

h5vc (version 2.6.3)

prepareTallyFile: prepareTallyFile

Description

Functions for preparing an HDF5 file for storing tally data and / or modifying an existing file

Usage

prepareTallyFile( filename, study, chrom, chromlength, nsamples, maxsamples = nsamples, chunkSize = 50000, sampleChunkSize = nsamples, compressionLevel = 9, referenceFillValue = 5 )
resizeCohort( filename, study, chrom, newNumberOfSamples, dimmap = .sampleDimMap, force = FALSE )

Arguments

filename
Filename of the HDF5 file that should store the tallies
study
Study identifier which will be used in structuring the file
chrom
Chromosome for which the structure should be generated
chromlength
The length of the chromosom, this will be the size of genomic position dimension
nsamples
Number of samples that will be stored in the file
maxsamples
Maximum Number of samples that can be stored in the file, this relatesto the maxdim property of HDF5 datasets, which is used to specify possible re-sizing of datasets after creation - see http:://www.hdfgroup.org for details
chunkSize
The size of the chunks used in HDF5 storage, this is specified along the genomic position dimension, by default chunks will always be all data from all samples with the given width along the genomic position dimension
compressionLevel
Compression level to use in the HDF5 file, defaults to 9 (highest), use lower numbers to improve access time at the cost of disk space usage
sampleChunkSize
Size of the HDF5 chunks along the sample dimension, the dafault value is the whole dataset, i.e. all samples. For larger datasets where the typical use-case is to extract only data corresponding to a specific sample and genomic position, smaller values of sampleChunkSize should be used.
referenceFillValue
Default value to be used for the Reference dataset, this is set to 5 by default, which corresponds to the nucleotide N
newNumberOfSamples
New cohort size, this must be smaller than the value of maxsamples that was provided when the file was created
dimmap
A list mapping dataset names to the dimension in which the samples are stored (e.g. "Counts" -> 2)
force
Boolean parameter that controls whether a shrinking operation (i.e. newNumberOfSamples is smaller than the current number of samples) should be performed or throw an error. Shrinking will result in data loss.

Value

  • Returns TRUE on success

Details

prepareTallyFile prepares (and creates if neccessary) an HDF5 file for storing the datasets that are associated with a tally. It creates the required groups and datasets (filled with 0's). resizeCohort{Resizes the datasets to a new number of samples, this is limited by the value of maxsamples that was provided in the initial call to prepareTallyFile}

Examples

Run this code
prepareTallyFile( file.path( tempdir(), "test.tally.hfs5" ), "SomeStudy", "ChromosomeB", 1e6, 20 )

Run the code above in your browser using DataLab