Learn R Programming

DataFusionGDM

Machine Learning Solutions for Integrating Partially Overlapped Genetic Datasets.

How to cite

If you use DataFusion-GDM, please cite:

  • Paper: Zhu J., Malmberg M.M., Shinozuka M., Retegan R.M., Cogan N.O., Jacobs J.L., Giri K., Smith K.F. (2025). Machine learning solutions for integrating partially overlapping genetic datasets and modelling host–endophyte effects in ryegrass (Lolium) dry matter yield estimation. Frontiers in Plant Science. https://doi.org/10.3389/fpls.2025.1543956
  • Software: Zhu, J. (2025). DataFusion-GDM. The University of Melbourne. Software. https://doi.org/10.26188/28602953

Author ORCID: https://orcid.org/0000-0002-9916-9732

Install

In R:

if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")
remotes::install_github("jiashuaiz/DataFusion-GDM")

Usage

library(DataFusionGDM)

# Simulate a GDM in memory and visualize
res <- run_genetic_scenario("island", n_pops = 40)
res$plots$heatmap()
res$plots$mds()

# Optionally export to CSV if needed (defaults to tempdir)
tmp <- export_simulated_gdm(scenario = "default", n_pops = 40, verbose = FALSE)
# unlink(tmp)  # clean up when finished

# Simulate and visualize
source(system.file("examples/simulate_gdm_quick.R", package = "DataFusionGDM"), echo = TRUE)

# MDS + Procrustes
source(system.file("examples/mds_procrustes_demo.R", package = "DataFusionGDM"), echo = TRUE)

# BESMI batch (small demo)
source(system.file("examples/besmi_batch_quick.R", package = "DataFusionGDM"), echo = TRUE)

Vignettes

See the package vignettes for end-to-end guides:

  • Getting started
  • MDS + Procrustes sensitivity
  • BESMI batch imputation

Open vignettes in R:

browseVignettes("DataFusionGDM")
vignette("getting-started", package = "DataFusionGDM")

Contents

  • Simulation and visualization APIs in R/simulate_gdm.R
  • MDS & Procrustes APIs in R/mds_procrustes.R
  • BESMI preparation and imputation APIs in R/besmi*.R
  • Vignettes under vignettes/ (no bundled data; examples use in-memory/temp files)

License

GPL-3.0

Copy Link

Version

Install

install.packages('DataFusionGDM')

Monthly Downloads

150

Version

1.3.2

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Jiashuai Zhu

Last Published

November 4th, 2025

Functions in DataFusionGDM (1.3.2)

perform_mds

Perform MDS on a pair of distance matrices
create_distance_heatmap

Create a heatmap of genetic distances (ggplot2)
.besmi_calculate_distance

Distance metrics
export_simulated_gdm

Export a simulated GDM to CSV
run_genetic_simulation

Run a high-level genetic simulation with configurable model
run_genetic_scenario

Run simulation with predefined biological scenarios
.double_center

Double-center a distance matrix
.besmi_initialize_M

Initialize matrix by column means
.besmi_determine_sampling_sizes

Determine bootstrap sample count for a given k
simulate_genetic_distances

Simulate genetic distances using realistic population structure
visualize_results

Create plotting handles for simulation results
besmi_batch_impute

Run BESMI imputation for a list of dataset paths
apply_procrustes

Procrustes alignment and mapping back to distances
besmi_create_masked_matrices

Create masked matrices for BESMI
besmi_prepare_full_dataset

Prepare full GDM dataset from CSV or RData
besmi_impute_single_dataset

Impute a single dataset from masked matrix path
besmi_iterative_imputation

Iterative imputation with MICE (tails-chain)
coords_to_distances

Convert coordinate matrix to distance matrix
create_mds_plot

Create MDS plot of genetic distances
besmi_knn_impute

KNN imputation sweep (uses VIM::kNN)