Learn R Programming

BigDataStatMeth (version 1.0.3)

bdblockmult_sparse_hdf5: Block matrix multiplication for sparse matrices

Description

Performs optimized block-wise matrix multiplication for sparse matrices stored in HDF5 format. The implementation is specifically designed to handle large sparse matrices efficiently through block operations and parallel processing.

Usage

bdblockmult_sparse_hdf5(
  filename,
  group,
  A,
  B,
  groupB = NULL,
  block_size = NULL,
  mixblock_size = NULL,
  paral = NULL,
  threads = NULL,
  outgroup = NULL,
  outdataset = NULL,
  overwrite = NULL
)

Value

Modifies the HDF5 file in place, adding the multiplication result

Arguments

filename

String indicating the HDF5 file path

group

String indicating the group path for matrix A

A

String specifying the dataset name for matrix A

B

String specifying the dataset name for matrix B

groupB

Optional string indicating group path for matrix B. If NULL, uses same group as A

block_size

Optional integer specifying block size for processing. If NULL, automatically determined based on matrix dimensions

mixblock_size

Optional integer for memory block size in parallel processing

paral

Optional boolean indicating whether to use parallel processing. Default is false

threads

Optional integer specifying number of threads for parallel processing. If NULL, uses maximum available threads

outgroup

Optional string specifying output group. Default is "OUTPUT"

outdataset

Optional string specifying output dataset name. Default is "A_x_B"

overwrite

Optional boolean indicating whether to overwrite existing datasets. Default is false

Details

The function implements optimized sparse matrix multiplication through:

  • Block-wise processing to manage memory usage

  • Automatic block size optimization

  • Parallel processing support

  • Efficient sparse matrix storage

Block size optimization considers:

  • Available system memory

  • Matrix dimensions and sparsity

  • Parallel processing requirements

Memory efficiency is achieved through:

  • Sparse matrix storage format

  • Block-wise processing

  • Minimal temporary storage

  • Proper resource cleanup

Examples

Run this code
if (FALSE) {
library(Matrix)
library(BigDataStatMeth)

# Create sparse test matrices
k <- 1e3
set.seed(1)
x_sparse <- sparseMatrix(
    i = sample(x = k, size = k),
    j = sample(x = k, size = k),
    x = rnorm(n = k)
)

set.seed(2)
y_sparse <- sparseMatrix(
    i = sample(x = k, size = k),
    j = sample(x = k, size = k),
    x = rnorm(n = k)
)

# Save to HDF5
bdCreate_hdf5_matrix("test.hdf5", as.matrix(x_sparse), "SPARSE", "x_sparse")
bdCreate_hdf5_matrix("test.hdf5", as.matrix(y_sparse), "SPARSE", "y_sparse")

# Perform multiplication
bdblockmult_sparse_hdf5("test.hdf5", "SPARSE", "x_sparse", "y_sparse",
                        block_size = 1024,
                        paral = TRUE,
                        threads = 4)
}

Run the code above in your browser using DataLab