Learn R Programming

SUMO (version 1.2.3)

simulateMultiOmics: Simulation of omics with predefined single or multiple latent factors in multi-omics

Description

Simulate multiple omics (>=2) datasets with predefined sample-level latent factors and corresponding feature-level signal regions. Each omic has unique signal structure, noise profile, and feature space.

Usage

simulateMultiOmics(
  vector_features,
  n_samples,
  n_factors,
  snr = 2,
  signal.samples = c(5, 0.05),
  signal.features = NULL,
  factor_structure = "mixed",
  num.factor = "multiple",
  seed = NULL,
  real_stats = FALSE,
  real_means_vars = NULL
)

Value

A list containing:

  • omics: List of omic matrices.

  • concatenated_datasets: Merged matrix of all omics.

  • list_alphas: Sample-level latent factor values.

  • list_betas: Feature-level loadings for each omic and factor.

  • signal_annotation: List of feature and sample signal blocks.

  • factor_structure: Input parameter.

  • factor_map: Map of which omics each factor affects.

Arguments

vector_features

Integer vector of number of features per omic (length k for k omics).

n_samples

Total number of samples.

n_factors

Number of latent factors.

snr

Numeric. Signal-to-noise ratio.

signal.samples

Length-2 vector (mean, sd) for sample-level signal values.

signal.features

List of length-k vectors (mean, sd) for each omic's feature-level signal.

factor_structure

Character. One of: "shared", "unique", "mixed", "partial", "custom".

num.factor

Character. Either "multiple" (default) or "single" factor mode.

seed

Integer seed for reproducibility (optional).

real_stats

Logical. If TRUE, noise variance and mean are derived from real_means_vars.

real_means_vars

Optional list of named vectors per omic: c(mean=..., var=...). Required if real_stats = TRUE.

Details

This function generates synthetic multi-omics datasets for benchmarking integrative methods. Each omic layer has its own feature distribution and noise characteristics.

Key properties:

  • Sample signal blocks for each latent factor are non-overlapping and randomly spaced.

  • Feature signal blocks per omic are also non-overlapping and assigned per factor.

  • Noise can be modeled using either standard SNR scaling (default) or real data statistics (if real_stats = TRUE).

  • Omics without assigned signal factors still receive background noise.

Examples

Run this code
# Example 1: Use standard SNR scaling (default)
sim1 <- simulateMultiOmics(
  vector_features = c(3000, 2500, 2000),
  n_samples = 100,
  n_factors = 3,
  snr = 3,
  signal.samples = c(5, 1),
  signal.features = list(
    c(3, 0.05),
    c(2.5, 0.05),
    c(2, 0.05)
  ),
  factor_structure = "mixed",
  num.factor = "multiple",
  seed = 123
)
plot_simData(sim_object = sim1, data = "merged", type = "heatmap")

# Example 2: Use real stats for noise modeling
sim2 <- simulateMultiOmics(
  vector_features = c(3000, 2500, 2000),
  n_samples = 100,
  n_factors = 3,
  snr = 3,
  signal.samples = c(5, 1),
  signal.features = list(
    c(3, 0.05),
    c(2.5, 0.05),
    c(2, 0.05)
  ),
  factor_structure = "mixed",
  num.factor = "multiple",
  real_stats = TRUE,
  real_means_vars = list(
    c(mean = 5, var = 1),
    c(mean = 4.5, var = 0.8),
    c(mean = 4.0, var = 0.6)
  ),
  seed = 123
)
plot_simData(sim_object = sim2, data = "merged", type = "heatmap")

Run the code above in your browser using DataLab