Learn R Programming

topolow (version 1.0.0)

generate_synthetic_datasets: Generate Synthetic Distance Matrices with Missing Data

Description

Creates synthetic distance matrices with controlled levels of missingness and noise for testing and validating mapping algorithms. Generates multiple datasets with different dimensionalities and missingness patterns. If output_dir is provided, the generated datasets are saved as RDS files.

Usage

generate_synthetic_datasets(
  n_dims_list,
  seeds,
  n_points,
  missingness_levels = list(S = 0.67, M = 0.77, L = 0.87),
  output_dir = NULL,
  prefix = "sim",
  save_plots = FALSE
)

Value

A list containing the generated synthetic data and metadata:

matrices

A list of generated symmetric distance matrices for each dimension.

panels

A list of generated assay panels (non-symmetric matrices) for each dimension.

metadata

A data.frame with the generation parameters for each dataset.

Arguments

n_dims_list

Numeric vector of dimensions to generate data for

seeds

Integer vector of random seeds (same length as n_dims_list)

n_points

Integer number of points to generate

missingness_levels

Named list of missingness percentages (default: list(S=0.67, M=0.77, L=0.87))

output_dir

Character path to directory for saving outputs. If NULL (the default), no files are saved.

prefix

Character string to prefix output files (optional)

save_plots

Logical whether to save network visualization plots. Requires output_dir to be set.

Examples

Run this code
# Generate datasets without saving to disk
results <- generate_synthetic_datasets(
  n_dims_list = c(2, 3),
  seeds = c(123, 456),
  n_points = 50
)
# \donttest{
# Generate datasets and save to a temporary directory
temp_out_dir <- tempdir()
results_saved <- generate_synthetic_datasets(
  n_dims_list = c(2),
  seeds = c(123),
  n_points = 10,
  missingness_levels = list(low=0.5, high=0.8),
  output_dir = temp_out_dir,
  save_plots = TRUE
)
list.files(temp_out_dir)
# Clean up the directory
unlink(temp_out_dir, recursive = TRUE)
# }

Run the code above in your browser using DataLab