h5dapply: h5dapply

Description

This is the central function of the h5vc package, allows an apply operation along common dimensions of datasets in a tally file.

Usage

## S3 method for class 'numeric':
h5dapply( ..., blocksize, range)
## S3 method for class 'GRanges':
h5dapply( ..., group, range)
## S3 method for class 'IRanges':
h5dapply( ..., range)

Arguments

blocksize

The size of the blocks in which to process the data (integer)

...

Further parameters to be handed over to FUN

range

The range along the specified dimensions which should be processed, this allows for limiting the apply to a specific region or set of samples, etc. - optional (defaults to the whole chromosome); This can be a GRanges, IRanges or numerical vector of length 2 (i.e [start, stop])

group

The group (location) within the HDF5 file, note that when range is numeric or IRanges this has to point to the location of the chromosome, e.g. /ExampleTally/Chr7. When range is a GRanges object, the chromosome information is encoded in the GRanges directly and group should only point to the root-group of the study, i.e. /ExampleTally

Value

A list with one entry per block, which is the result of applying FUN to the datasets specified in the parameter names within the block.

Details

Additional function parameters are: [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object] This function applys parameter FUN to blocks along a specified axis within the tally file, group and specified datasets. It creates a list of arrays (one for each dataset) and processes that list with the function FUN.

This is by far the most essential and powerful function within this package since it allows the user to execute their own analysis functions on the tallies stored within the HDF5 tally file.

The supplied function FUN must have a parameter data or ... (the former is the expected behaviour), which will be supplied to FUN from h5dapply for each block. This structure is a list with one slot for each dataset specified in the names argument to h5dapply containing the array corresponding to the current block in the given dataset. Furthemore the slot h5dapplyInfo is reserved and contains another list with the following content:

Blockstart is an integer specifying the starting position of the current block (in the dimension specified by the dims argument to h5dapply)

Blockend is an integer specifying the end position of the current block (in the dimension specified by the dims argument to h5dapply)

Datasets Contains a data.frame as it is returned by h5ls listing all datasets present in the other slots of data with their group, name, dimensions, number of dimensions (DimCount) and the dimension that is used for splitting into blocks (PosDim)

Group contains the name of the group as specified by the group argument to h5dapply

Examples

Run this code

# loading library and example data
  library(h5vc)
  tallyFile <- system.file( "extdata", "example.tally.hfs5", package = "h5vcData" )
  sampleData <- getSampleData( tallyFile, "/ExampleStudy/16" )
  # check the available samples and sampleData
  print(sampleData)
  data <- h5dapply( #extracting coverage using h5dapply
    filename = tallyFile,
    group = "/ExampleStudy/16",
    blocksize = 1000,
    FUN = function(x) rowSums(x$Coverages),
    names = c( "Coverages" ),
    range = c(29000000,29010000),
    verbose = TRUE
    )
  coverages <- do.call( rbind, data )
  colnames(coverages) <- sampleData$Sample[order(sampleData$Column)]
  coverages
  #Subsetting by Sample
  sampleData <- sampleData[sampleData$Patient == "Patient5",]
  data <- h5dapply( #extracting coverage using h5dapply
    filename = tallyFile,
    group = "/ExampleStudy/16",
    blocksize = 1000,
    FUN = function(x) rowSums(x$Coverages),
    names = c( "Coverages" ),
    range = c(29000000,29010000),
    samples = sampleData$Sample,
    verbose = TRUE
    )
  coverages <- do.call( rbind, data )
  colnames(coverages) <- sampleData$Sample[order(sampleData$Column)]
  coverages
  #Using GRanges and IRanges
  library(GenomicRanges)
  library(IRanges)
  granges <- GRanges(
  c(rep("16", 10), rep("22", 10)),
  ranges = IRanges(
    start = c(seq(29000000,29009000, 1000), seq(39000000,39009000, 1000)),
    width = 1000
  ))
  data <- h5dapply( #extracting coverage using h5dapply
    filename = tallyFile,
    group = "/ExampleStudy",
    blocksize = 1000,
    FUN = function(x) rowSums(x$Coverages),
    names = c( "Coverages" ),
    range = granges,
    verbose = TRUE
    )
  lapply( data, function(x) do.call(rbind, x) )

Run the code above in your browser using DataLab