bdCholesky_hdf5: Cholesky Decomposition for HDF5-Stored Matrices

Description

Computes the Cholesky decomposition of a symmetric positive-definite matrix stored in an HDF5 file. The Cholesky decomposition factors a matrix A into the product A = LL' where L is a lower triangular matrix.

Usage

bdCholesky_hdf5(
  filename,
  group,
  dataset,
  outdataset,
  outgroup = NULL,
  fullMatrix = NULL,
  overwrite = NULL,
  threads = NULL,
  elementsBlock = 1000000L
)

Value

A list containing the location of the Cholesky decomposition result:

fn: Character string. Path to the HDF5 file containing the result
ds: Character string. Full dataset path to the Cholesky decomposition result within the HDF5 file

L: The lower triangular Cholesky factor

Arguments

filename: Character string. Path to the HDF5 file containing the input matrix.
group: Character string. Path to the group containing the input dataset.
dataset: Character string. Name of the input dataset to decompose.
outdataset: Character string. Name for the output dataset.
outgroup: Character string. Optional output group path. If not provided, results are stored in the input group.
fullMatrix: Logical. If TRUE, stores the complete matrix. If FALSE (default), stores only the lower triangular part to save space.
overwrite: Logical. If TRUE, allows overwriting existing results.
threads: Integer. Number of threads for parallel computation.
elementsBlock: Integer. Maximum number of elements to process in each block (default = 100,000). For matrices larger than 5000x5000, automatically adjusted to number of rows or columns * 2.

Details

The Cholesky decomposition is a specialized factorization for symmetric positive-definite matrices that provides several advantages:

More efficient than LU decomposition for symmetric positive-definite matrices
Numerically stable
Useful for solving linear systems and computing matrix inverses
Important in statistical computing (e.g., for sampling from multivariate normal distributions)

This implementation features:

Block-based computation for large matrices
Optional storage formats (full or triangular)
Parallel processing support
Memory-efficient block algorithm

Mathematical Details: For a symmetric positive-definite matrix A, the decomposition A = LL' has the following properties:

L is lower triangular
L has positive diagonal elements
L is unique

The elements of L are computed using: $$l_{ii} = \sqrt{a_{ii} - \sum_{k=1}^{i-1} l_{ik}^2}$$ $$l_{ji} = \frac{1}{l_{ii}}(a_{ji} - \sum_{k=1}^{i-1} l_{ik}l_{jk})$$

References

Golub, G. H., & Van Loan, C. F. (2013). Matrix Computations, 4th Edition. Johns Hopkins University Press.
Higham, N. J. (2009). Cholesky factorization. Wiley Interdisciplinary Reviews: Computational Statistics, 1(2), 251-254.

Examples

Run this code

if (FALSE) {
library(rhdf5)

# Create a symmetric positive-definite matrix
set.seed(1234)
X <- matrix(rnorm(100), 10, 10)
A <- crossprod(X)  # A = X'X is symmetric positive-definite
    
# Save to HDF5
h5createFile("matrix.h5")
h5write(A, "matrix.h5", "data/matrix")
        
# Compute Cholesky decomposition
bdCholesky_hdf5("matrix.h5", "data", "matrix",
                outdataset = "chol",
                outgroup = "decompositions",
                fullMatrix = FALSE)
       
# Verify the decomposition
L <- h5read("matrix.h5", "decompositions/chol")
max(abs(A - L %*% t(L)))  # Should be very small
}

Run the code above in your browser using DataLab