Learn R Programming

BigDataStatMeth (version 1.0.3)

bdcomputeMatrixVector_hdf5: Apply Vector Operations to HDF5 Matrix

Description

Performs element-wise operations between a matrix and a vector stored in HDF5 format. The function supports addition, subtraction, multiplication, division and power operations, with options for row-wise or column-wise application and parallel processing.

Usage

bdcomputeMatrixVector_hdf5(
  filename,
  group,
  dataset,
  vectorgroup,
  vectordataset,
  outdataset,
  func,
  outgroup = NULL,
  byrows = NULL,
  paral = NULL,
  threads = NULL,
  overwrite = FALSE
)

Value

List with components:

fn

Character string with the HDF5 filename

gr

Character string with the HDF5 group

ds

Character string with the full dataset path (group/dataset)

Arguments

filename

String. Path to the HDF5 file containing the datasets.

group

String. Path to the group containing the matrix dataset.

dataset

String. Name of the matrix dataset.

vectorgroup

String. Path to the group containing the vector dataset.

vectordataset

String. Name of the vector dataset.

outdataset

String. Name for the output dataset.

func

String. Operation to perform: "+", "-", "*", "/", or "pow".

outgroup

Optional string. Output group path. If not provided, results are stored in the same group as the input matrix.

byrows

Logical. If TRUE, applies operation by rows. If FALSE (default), applies operation by columns.

paral

Logical. If TRUE, enables parallel processing.

threads

Integer. Number of threads for parallel processing. Ignored if paral is FALSE.

overwrite

Logical. If TRUE, allows overwriting existing datasets.

Details

This function provides a flexible interface for performing element-wise operations between matrices and vectors stored in HDF5 format. It supports:

  • Four basic operations:

    • Addition (+): Adds vector elements to matrix rows/columns

    • Subtraction (-): Subtracts vector elements from matrix rows/columns

    • Multiplication (*): Multiplies matrix rows/columns by vector elements

    • Division (/): Divides matrix rows/columns by vector elements

    • Power (pow): power matrix rows/columns by vector elements

  • Processing options:

    • Row-wise or column-wise operations

    • Parallel processing for improved performance

    • Configurable thread count for parallel execution

    • Memory-efficient processing for large datasets

The function performs extensive validation:

  • Checks matrix and vector dimensions for compatibility

  • Validates operation type

  • Verifies HDF5 file and dataset accessibility

  • Ensures proper data structures (matrix vs. vector)

References

  • The HDF Group. (2000-2010). HDF5 User's Guide.

  • Eddelbuettel, D., & François, R. (2011). Rcpp: Seamless R and C++ Integration. Journal of Statistical Software, 40(8), 1-18.

See Also

  • bdCreate_hdf5_matrix for creating HDF5 matrices

Examples

Run this code
library(BigDataStatMeth)
    
# Create test data
set.seed(123)
Y <- matrix(rnorm(100), 10, 10)
X <- matrix(rnorm(10), 10, 1)
        
# Save to HDF5
bdCreate_hdf5_matrix("test.hdf5", Y, "data", "Y",
                     overwriteFile = TRUE,
                     overwriteDataset = FALSE,
                     unlimited = FALSE)
bdCreate_hdf5_matrix("test.hdf5", X, "data", "X",
                     overwriteFile = FALSE,
                     overwriteDataset = FALSE,
                     unlimited = FALSE)
            
# Multiply matrix rows by vector
bdcomputeMatrixVector_hdf5("test.hdf5",
                           group = "data",
                           dataset = "Y",
                           vectorgroup = "data",
                           vectordataset = "X",
                           outdataset = "ProdComputed",
                           func = "*",
                           byrows = TRUE,
                           overwrite = TRUE)
    
# Subtract vector from matrix rows
bdcomputeMatrixVector_hdf5("test.hdf5",
                           group = "data",
                           dataset = "Y",
                           vectorgroup = "data",
                           vectordataset = "X",
                           outdataset = "SubsComputed",
                           func = "-",
                           byrows = TRUE,
                           overwrite = TRUE)
    
# Subtract vector from matrix columns
bdcomputeMatrixVector_hdf5("test.hdf5",
                           group = "data",
                           dataset = "Y",
                           vectorgroup = "data",
                           vectordataset = "X",
                           outdataset = "SubsComputed",
                           func = "-",
                           byrows = FALSE,
                           overwrite = TRUE)
                           
# Cleanup
if (file.exists("test.hdf5")) {
  file.remove("test.hdf5")
}

Run the code above in your browser using DataLab