Learn R Programming

BigDataStatMeth (version 1.0.3)

bdImputeSNPs_hdf5: Impute Missing SNP Values in HDF5 Dataset

Description

Performs imputation of missing values in SNP (Single Nucleotide Polymorphism) data stored in HDF5 format.

Usage

bdImputeSNPs_hdf5(
  filename,
  group,
  dataset,
  outgroup = NULL,
  outdataset = NULL,
  bycols = TRUE,
  paral = NULL,
  threads = NULL,
  overwrite = NULL
)

Value

List with components:

fn

Character string with the HDF5 filename

ds

Character string with the full dataset path to the imputed data (group/dataset)

Arguments

filename

Character string. Path to the HDF5 file.

group

Character string. Path to the group containing input dataset.

dataset

Character string. Name of the dataset to impute.

outgroup

Character string (optional). Output group path. If NULL, uses input group.

outdataset

Character string (optional). Output dataset name. If NULL, overwrites input dataset.

bycols

Logical (optional). Whether to impute by columns (TRUE) or rows (FALSE). Default is TRUE.

paral

Logical (optional). Whether to use parallel processing.

threads

Integer (optional). Number of threads for parallel processing.

overwrite

Logical (optional). Whether to overwrite existing dataset.

Details

This function provides efficient imputation capabilities for genomic data with support for:

  • Imputation options:

    • Row-wise or column-wise imputation

    • Parallel processing

    • Configurable thread count

  • Output options:

    • Custom output location

    • In-place modification

    • Overwrite protection

  • Implementation features:

    • Memory-efficient processing

    • Safe file operations

    • Error handling

The function supports both in-place modification and creation of new datasets.

References

  • The HDF Group. (2000-2010). HDF5 User's Guide.

  • Li, Y., et al. (2009). Genotype Imputation. Annual Review of Genomics and Human Genetics, 10, 387-406.

See Also

  • bdCreate_hdf5_matrix for creating HDF5 matrices

Examples

Run this code
if (FALSE) {
library(BigDataStatMeth)

# Create test data with missing values
data <- matrix(sample(c(0, 1, 2, NA), 100, replace = TRUE), 10, 10)

# Save to HDF5
fn <- "snp_data.hdf5"
bdCreate_hdf5_matrix(fn, data, "genotype", "snps",
                     overwriteFile = TRUE)

# Impute missing values
bdImputeSNPs_hdf5(
  filename = fn,
  group = "genotype",
  dataset = "snps",
  outgroup = "genotype_imputed",
  outdataset = "snps_complete",
  bycols = TRUE,
  paral = TRUE
)

# Cleanup
if (file.exists(fn)) {
  file.remove(fn)
}
}

Run the code above in your browser using DataLab