Learn R Programming

BigDataStatMeth (version 1.0.3)

bdNormalize_hdf5: Normalize dataset in HDF5 file

Description

Performs block-wise normalization of datasets stored in HDF5 format through centering and/or scaling operations. Supports both row-wise and column-wise normalization with memory-efficient block processing.

Usage

bdNormalize_hdf5(
  filename,
  group,
  dataset,
  bcenter = NULL,
  bscale = NULL,
  byrows = NULL,
  wsize = NULL,
  overwrite = FALSE
)

Value

List with components. If an error occurs, all string values are returned as empty strings (""):

fn

Character string. Path to the HDF5 file containing the results

ds

Character string. Full dataset path to the normalized data, stored under "NORMALIZED/\[group\]/\[dataset\]"

mean

Character string. Dataset path to the column means used for centering, stored under "NORMALIZED/\[group\]/mean.\[dataset\]"

sd

Character string. Dataset path to the standard deviations used for scaling, stored under "NORMALIZED/\[group\]/sd.\[dataset\]"

Arguments

filename

String indicating the HDF5 file path

group

String specifying the group containing the dataset

dataset

String specifying the dataset name to normalize

bcenter

Optional boolean indicating whether to center the data. If TRUE (default), subtracts mean from each column/row

bscale

Optional boolean indicating whether to scale the data. If TRUE (default), divides by standard deviation

byrows

Optional boolean indicating whether to operate by rows. If TRUE, processes row-wise; if FALSE (default), column-wise

wsize

Optional integer specifying the block size for processing. Default is 1000

overwrite

Optional boolean indicating whether to overwrite existing datasets. Default is false

Details

The function implements block-wise normalization through:

Statistical computations:

  • Mean calculation (for centering)

  • Standard deviation calculation (for scaling)

  • Efficient block-wise updates

Memory efficiency:

  • Block-wise data processing

  • Minimal temporary storage

  • Proper resource cleanup

Processing options:

  • Row-wise or column-wise operations

  • Flexible block size selection

  • Optional centering and scaling

Error handling:

  • Input validation

  • Resource management

  • Exception handling

Examples

Run this code
if (FALSE) {
library(BigDataStatMeth)

# Create test data
data <- matrix(rnorm(1000*100), 1000, 100)

# Save to HDF5
bdCreate_hdf5_matrix("test.hdf5", data, "data", "matrix",
                     overwriteFile = TRUE)

# Normalize data
bdNormalize_hdf5("test.hdf5", "data", "matrix",
                 bcenter = TRUE,
                 bscale = TRUE,
                 wsize = 1000)
}

Run the code above in your browser using DataLab