Learn R Programming

BigDataStatMeth (version 1.0.3)

bdgetSDandMean_hdf5: Compute Matrix Standard Deviation and Mean in HDF5

Description

Computes standard deviation and/or mean statistics for a matrix stored in HDF5 format, with support for row-wise or column-wise computations.

Usage

bdgetSDandMean_hdf5(
  filename,
  group,
  dataset,
  outgroup = NULL,
  outdataset = NULL,
  sd = NULL,
  mean = NULL,
  byrows = NULL,
  onmemory = NULL,
  wsize = NULL,
  overwrite = FALSE
)

Value

Depending on the onmemory parameter:

If onmemory = TRUE

List with components:

  • mean: Numeric vector with column/row means (or NULL if not computed)

  • sd: Numeric vector with column/row standard deviations (or NULL if not computed)

If onmemory = FALSE

List with components:

  • fn: Character string with the HDF5 filename

  • mean: Character string with the full dataset path to the means (group/dataset)

  • sd: Character string with the full dataset path to the standard deviations (group/dataset)

Arguments

filename

Character string. Path to the HDF5 file.

group

Character string. Path to the group containing the dataset.

dataset

Character string. Name of the dataset to analyze.

outgroup

Character string, custom output group name (default: mean_sd)

outdataset

Character string, custom correlation dataset name (default: mean.dataset_original_name and sd.dataset_original_name)

sd

Logical (optional). Whether to compute sd. Default is TRUE.

mean

Logical (optional). Whether to compute mean. Default is TRUE.

byrows

Logical (optional). Whether to compute by rows (TRUE) or columns (FALSE). Default is FALSE.

onmemory

logical (default = FALSE). If TRUE, results are kept in memory and returned as a matrix; nothing is written to disk. If FALSE, results are written to disk.

wsize

Integer (optional). Block size for processing. Default is 1000.

overwrite

Logical (optional). Whether to overwrite existing results. Default is FALSE.

Details

This function provides efficient statistical computation capabilities with:

  • Computation options:

    • Standard deviation computation

    • Mean computation

    • Row-wise or column-wise processing

  • Processing features:

    • Block-based computation

    • Memory-efficient processing

    • Configurable block size

  • Implementation features:

    • Safe HDF5 file operations

    • Memory-efficient implementation

    • Comprehensive error handling

Results are stored in a new group 'mean_sd' within the HDF5 file.

References

  • The HDF Group. (2000-2010). HDF5 User's Guide.

  • Welford, B. P. (1962). Note on a method for calculating corrected sums of squares and products. Technometrics, 4(3), 419-420.

See Also

  • bdCreate_hdf5_matrix for creating HDF5 matrices

Examples

Run this code
if (FALSE) {
library(BigDataStatMeth)

# Create test matrices
set.seed(123)
Y <- matrix(rnorm(100), 10, 10)
X <- matrix(rnorm(10), 10, 1)

# Save to HDF5
bdCreate_hdf5_matrix("test.hdf5", Y, "data", "matrix1",
                     overwriteFile = TRUE)
bdCreate_hdf5_matrix("test.hdf5", X, "data", "vector1",
                     overwriteFile = FALSE)

# Compute statistics
bdgetSDandMean_hdf5(
  filename = "test.hdf5",
  group = "data",
  dataset = "matrix1",
  sd = TRUE,
  mean = TRUE,
  byrows = TRUE,
  wsize = 500
)

# Cleanup
if (file.exists("test.hdf5")) {
  file.remove("test.hdf5")
}
}

Run the code above in your browser using DataLab