bdCrossprod_hdf5: Crossprod with hdf5 matrix

Description

Performs optimized cross product operations on matrices stored in HDF5 format. For a single matrix A, computes A^t * A. For two matrices A and B, computes A^t * B. Uses block-wise processing for memory efficiency.

Usage

bdCrossprod_hdf5(
  filename,
  group,
  A,
  B = NULL,
  groupB = NULL,
  block_size = NULL,
  mixblock_size = NULL,
  paral = NULL,
  threads = NULL,
  outgroup = NULL,
  outdataset = NULL,
  overwrite = NULL
)

Value

A list containing the location of the crossproduct result:

fn: Character string. Path to the HDF5 file containing the result
ds: Character string. Full dataset path to the crossproduct result (t(A) %% A or t(A) %% B) within the HDF5 file

Arguments

filename: String indicating the HDF5 file path
group: String indicating the input group containing matrix A
A: String specifying the dataset name for matrix A
B: Optional string specifying dataset name for matrix B. If NULL, performs A^t * A
groupB: Optional string indicating group containing matrix B. If NULL, uses same group as A
block_size: Optional integer specifying the block size for processing. Default is automatically determined based on matrix dimensions
mixblock_size: Optional integer for memory block size in parallel processing
paral: Optional boolean indicating whether to use parallel processing. Default is false
threads: Optional integer specifying number of threads for parallel processing. If NULL, uses maximum available threads
outgroup: Optional string specifying output group. Default is "OUTPUT"
outdataset: Optional string specifying output dataset name. Default is "CrossProd_A_x_B"
overwrite: Optional boolean indicating whether to overwrite existing datasets. Default is false

Details

The function implements block-wise matrix multiplication to handle large matrices efficiently. Block size is automatically optimized based on:

Available memory
Matrix dimensions
Whether parallel processing is enabled

For parallel processing:

Uses OpenMP for thread management
Implements cache-friendly block operations
Provides automatic thread count optimization

Memory efficiency is achieved through:

Block-wise reading and writing
Minimal temporary storage
Proper resource cleanup

Examples

Run this code

if (FALSE) {
  library(BigDataStatMeth)
  library(rhdf5)
  
  # Create test matrix
  N = 1000
  M = 1000
  set.seed(555)
  a <- matrix(rnorm(N*M), N, M)
  
  # Save to HDF5
  bdCreate_hdf5_matrix("test.hdf5", a, "INPUT", "A", overwriteFile = TRUE)
  
  # Compute cross product
  bdCrossprod_hdf5("test.hdf5", "INPUT", "A", 
                   outgroup = "OUTPUT",
                   outdataset = "result",
                   block_size = 1024,
                   paral = TRUE,
                   threads = 4)
}

Run the code above in your browser using DataLab