Learn R Programming

mrIML: multi-Response (Multivariate) Interpretable Machine Learning

Overview

This package aims to enable users to build and interpret multivariate machine learning models harnessing the tidyverse (tidy model syntax in particular). This package builds off ideas from Gradient Forests (Ellis et al., 2012), ecological genomic approaches (Fitzpatrick & Keller, 2015), and multi-response stacking algorithms (Xing et al., 2020).

This package can be of use for any multi-response machine learning problem, but was designed to handle data common to community ecology (site by species data) and ecological genomics (individual or population by SNP loci).

How to Install

You can install the development version of mrIML using devtools:

install.packages("mrIML")

# Install development version
devtools::install_github('nickfountainjones/mrIML')

Using mrIML

To get started, load mrIML and tidymodels:

library(mrIML)
library(tidymodels)
#> ── Attaching packages ────────────────────────────────────── tidymodels 1.3.0 ──
#> ✔ broom        1.0.8     ✔ recipes      1.3.0
#> ✔ dials        1.4.0     ✔ rsample      1.3.0
#> ✔ dplyr        1.1.4     ✔ tibble       3.2.1
#> ✔ ggplot2      3.5.2     ✔ tidyr        1.3.1
#> ✔ infer        1.0.8     ✔ tune         1.3.0
#> ✔ modeldata    1.4.0     ✔ workflows    1.2.0
#> ✔ parsnip      1.3.1     ✔ workflowsets 1.1.0
#> ✔ purrr        1.0.4     ✔ yardstick    1.3.2
#> ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
#> ✖ purrr::discard() masks scales::discard()
#> ✖ dplyr::filter()  masks stats::filter()
#> ✖ dplyr::lag()     masks stats::lag()
#> ✖ recipes::step()  masks stats::step()

Many functions in mrIML benefit from parallel processing.

future::plan("multisession", workers = 2)

The core function of mrIML is mrIMLpredicts(), which is a wrapper around the tidymodels workflow that fits a provided model to each response variable in a multi-response data set.

# Load example multi-response data
data <- MRFcov::Bird.parasites
# Split into response and predictor data
Y <- data %>%
  select(-c("scale.prop.zos"))
X <- data %>%
  select(scale.prop.zos)

# Define tidymodel
model <- rand_forest(
  trees = 100,
  mode = "classification",
  mtry = tune(),
  min_n = tune()
) %>%
  set_engine("randomForest")

# Fit multi-response model
mrIML_model <- mrIMLpredicts(
  X = X,
  Y = Y,
  Model = model,
  prop = 0.7,
  k = 5
)
#>   |                                                                              |                                                                      |   0%  |                                                                              |==================                                                    |  25%  |                                                                              |===================================                                   |  50%  |                                                                              |====================================================                  |  75%  |                                                                              |======================================================================| 100%

The object mrIML_model can be investigated using:

  • mrIMLperformance() to get performance metrics for each response variable,
  • mrvip() to get variable importance for each response variable,
  • mrFlashlight() to get partial dependence plots for each response variable,
  • mrCovar() to get covariate importance for each predictor variable, and
  • mrInteractions() to get interaction importance for each predictor variable in the response models.

Two multi-response models can be compared using mrPerformance().

Bootstrapping can be implemented using mrBootstrap(), which can then be used to quantify uncertainty around partial dependence plots, mrPdPlotBootstrap(), and variable importance, mrvipBootstrap(), as well as build co-occurrence networks using mrCoOccurNet().

Recent mrIML publications

  1. Fountain-Jones, N. M., Kozakiewicz, C. P., Forester, B. R., Landguth, E. L., Carver, S., Charleston, M., Gagne, R. B., Greenwell, B., Kraberger, S., Trumbo, D. R., Mayer, M., Clark, N. J., & Machado, G. (2021). MrIML: Multi-response interpretable machine learning to model genomic landscapes. Molecular Ecology Resources, 21, 2766–2781. https://doi.org/10.1111/1755-0998.13495

  2. Sykes, A. L., Silva, G. S., Holtkamp, D. J., Mauch, B. W., Osemeke, O., Linhares, D. C. L., & Machado, G. (2021). Interpretable machine learning applied to on-farm biosecurity and porcine reproductive and respiratory syndrome virus. Transboundary and Emerging Diseases, 00, 1–15. https://doi.org/10.1111/tbed.14369

  3. Fountain-Jones, N. M., Appaw, R., Alkhamis, M., Baker, S., Clark, N., Powell-Romero, F., Mayer, M., Machado, G., & Videvall, E. (2024). Advancing ecological community analysis with MrIML 2.0: Unravelling taxa associations through interpretable machine learning. Authorea [preprint]. https://doi.org/10.22541/au.172676147.77148600/v1

References

Ellis, N., Smith, S. J., & Pitcher, C. R. (2012). Gradient forests: calculating importance gradients on physical predictors. Ecology, 93, 156-168. https://doi.org/10.1890/11-0252.1

Fitzpatrick, M. C., & Keller, S. R. (2015). Ecological genomics meets community-level modelling of biodiversity: Mapping the genomic landscape of current and future environmental adaptation. Ecology Letters, 18, 1–16. https://doi.org/10.1111/ele.12376

Xing, L., Lesperance, M. L., & Zhang, X. (2020). Simultaneous prediction of multiple outcomes using revised stacking algorithms. Bioinformatics, 36, 65-72. https://doi.org/10.1093/bioinformatics/btz531

Copy Link

Version

Install

install.packages('mrIML')

Monthly Downloads

177

Version

2.2.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Nick Fountain-Jones

Last Published

November 21st, 2025

Functions in mrIML (2.2.0)

mrCoOccurNet

Generate a MrIML co-occurrence network
mrIMLpredicts

Generates a multi-response predictive model
mrInteractions

Calculate and visualize feature interactions
cbind

A cbind variation that ignores objects with zero dimention
readSnpsPed

Conversion to single column per locus from plink file via LEA functionality
mrFlashlight

Convert mrIML object into a flashlight object
mrCovar

Investigate partial dependencies of a covariate for mrIML JSDMs (Joint Species Distribution Models)
resist_components

Calculates resistance components from a list of pairwise resistance surfaces
mrIMLperformance

Calculate general performance metrics of a mrIML model
mrIML_bird_parasites_RF

An example mrIML model fit to MRFcov::Bird.parasites
mrShapely

Generate SHAP (SHapley Additive exPlanations) Plots for Multiple Models and Responses
mrVip

Calculates and helps interpret variable importance for mrIML models.
filterRareCommon

Filter rare response variables from the data
mrPdPlotBootstrap

Bootstrap Partial Dependence Plots
mrPerformancePlot

Plot Model Performance Comparison
mrVipPCA

Principal Component Analysis of mrIML variable importance
mrIML_bird_parasites_LM

An example mrIML model fit to MRFcov::Bird.parasites
mrIML-package

mrIML: Multi-Response (Multivariate) Interpretable Machine Learning
%>%

Pipe operator
mrBootstrap

Bootstrap mrIML model predictions