Learn R Programming

BigDataStatMeth (version 1.0.3)

bdtCrossprod_hdf5: Transposed cross product with HDF5 matrices

Description

Performs optimized transposed cross product operations on matrices stored in HDF5 format. For a single matrix A, computes A * A^t. For two matrices A and B, computes A * B^t. Uses block-wise processing for memory efficiency.

Usage

bdtCrossprod_hdf5(
  filename,
  group,
  A,
  B = NULL,
  groupB = NULL,
  block_size = NULL,
  mixblock_size = NULL,
  paral = NULL,
  threads = NULL,
  outgroup = NULL,
  outdataset = NULL,
  overwrite = NULL
)

Value

A list containing the location of the transposed crossproduct result:

fn

Character string. Path to the HDF5 file containing the result

ds

Character string. Full dataset path to the transposed crossproduct result (A %% t(A) or A %% t(B)) within the HDF5 file

Arguments

filename

String indicating the HDF5 file path

group

String indicating the input group containing matrix A

A

String specifying the dataset name for matrix A

B

Optional string specifying dataset name for matrix B. If NULL, performs A * A^t

groupB

Optional string indicating group containing matrix B. If NULL, uses same group as A

block_size

Optional integer specifying the block size for processing. Default is automatically determined based on matrix dimensions

mixblock_size

Optional integer for memory block size in parallel processing

paral

Optional boolean indicating whether to use parallel processing. Default is false

threads

Optional integer specifying number of threads for parallel processing. If NULL, uses maximum available threads

outgroup

Optional string specifying output group. Default is "OUTPUT"

outdataset

Optional string specifying output dataset name. Default is "tCrossProd_A_x_B"

overwrite

Optional boolean indicating whether to overwrite existing datasets. Default is false

Details

The function implements block-wise matrix multiplication to handle large matrices efficiently. Block size is automatically optimized based on:

  • Available memory

  • Matrix dimensions

  • Whether parallel processing is enabled

For parallel processing:

  • Uses OpenMP for thread management

  • Implements cache-friendly block operations

  • Provides automatic thread count optimization

Memory efficiency is achieved through:

  • Block-wise reading and writing

  • Minimal temporary storage

  • Proper resource cleanup

Mathematical operations:

  • For single matrix A: computes A * A^t

  • For two matrices A, B: computes A * B^t

  • Optimized for numerical stability

Examples

Run this code
if (FALSE) {
library(BigDataStatMeth)
library(rhdf5)

# Create test matrix
N <- 1000
M <- 1000
set.seed(555)
a <- matrix(rnorm(N*M), N, M)

# Save to HDF5
bdCreate_hdf5_matrix("test.hdf5", a, "INPUT", "A",
                     overwriteFile = TRUE)

# Compute transposed cross product
bdtCrossprod_hdf5("test.hdf5", "INPUT", "A",
                  outgroup = "OUTPUT",
                  outdataset = "result",
                  block_size = 1024,
                  paral = TRUE,
                  threads = 4)
}

Run the code above in your browser using DataLab