Learn R Programming

BigDataStatMeth (version 1.0.3)

bdCholesky_hdf5: Cholesky Decomposition for HDF5-Stored Matrices

Description

Computes the Cholesky decomposition of a symmetric positive-definite matrix stored in an HDF5 file. The Cholesky decomposition factors a matrix A into the product A = LL' where L is a lower triangular matrix.

Usage

bdCholesky_hdf5(
  filename,
  group,
  dataset,
  outdataset,
  outgroup = NULL,
  fullMatrix = NULL,
  overwrite = NULL,
  threads = NULL,
  elementsBlock = 1000000L
)

Value

A list containing the location of the Cholesky decomposition result:

fn

Character string. Path to the HDF5 file containing the result

ds

Character string. Full dataset path to the Cholesky decomposition result within the HDF5 file

L

The lower triangular Cholesky factor

Arguments

filename

Character string. Path to the HDF5 file containing the input matrix.

group

Character string. Path to the group containing the input dataset.

dataset

Character string. Name of the input dataset to decompose.

outdataset

Character string. Name for the output dataset.

outgroup

Character string. Optional output group path. If not provided, results are stored in the input group.

fullMatrix

Logical. If TRUE, stores the complete matrix. If FALSE (default), stores only the lower triangular part to save space.

overwrite

Logical. If TRUE, allows overwriting existing results.

threads

Integer. Number of threads for parallel computation.

elementsBlock

Integer. Maximum number of elements to process in each block (default = 100,000). For matrices larger than 5000x5000, automatically adjusted to number of rows or columns * 2.

Details

The Cholesky decomposition is a specialized factorization for symmetric positive-definite matrices that provides several advantages:

  • More efficient than LU decomposition for symmetric positive-definite matrices

  • Numerically stable

  • Useful for solving linear systems and computing matrix inverses

  • Important in statistical computing (e.g., for sampling from multivariate normal distributions)

This implementation features:

  • Block-based computation for large matrices

  • Optional storage formats (full or triangular)

  • Parallel processing support

  • Memory-efficient block algorithm

Mathematical Details: For a symmetric positive-definite matrix A, the decomposition A = LL' has the following properties:

  • L is lower triangular

  • L has positive diagonal elements

  • L is unique

The elements of L are computed using: $$l_{ii} = \sqrt{a_{ii} - \sum_{k=1}^{i-1} l_{ik}^2}$$ $$l_{ji} = \frac{1}{l_{ii}}(a_{ji} - \sum_{k=1}^{i-1} l_{ik}l_{jk})$$

References

  • Golub, G. H., & Van Loan, C. F. (2013). Matrix Computations, 4th Edition. Johns Hopkins University Press.

  • Higham, N. J. (2009). Cholesky factorization. Wiley Interdisciplinary Reviews: Computational Statistics, 1(2), 251-254.

See Also

  • bdInvCholesky_hdf5 for computing inverse using Cholesky decomposition

  • bdSolve_hdf5 for solving linear systems

Examples

Run this code
if (FALSE) {
library(rhdf5)

# Create a symmetric positive-definite matrix
set.seed(1234)
X <- matrix(rnorm(100), 10, 10)
A <- crossprod(X)  # A = X'X is symmetric positive-definite
    
# Save to HDF5
h5createFile("matrix.h5")
h5write(A, "matrix.h5", "data/matrix")
        
# Compute Cholesky decomposition
bdCholesky_hdf5("matrix.h5", "data", "matrix",
                outdataset = "chol",
                outgroup = "decompositions",
                fullMatrix = FALSE)
       
# Verify the decomposition
L <- h5read("matrix.h5", "decompositions/chol")
max(abs(A - L %*% t(L)))  # Should be very small
}

Run the code above in your browser using DataLab