Spec.interact: Species Interaction Inferrences

Description

This function describes interspecies interactions based on the discrete-time Lotka-Volterral model.

Usage

Spec.interact(
  Data,
  metadata,
  Group_var,
  abund_centered_method = "median",
  num_iterations = 10,
  error_threshold = 0.001,
  pre_error = 10000,
  seed = NULL
)

Value

A S3 object with an element for each group defined by Group_var. Each element is a list containing:

interaction_matrices: A three-dimensional array of estimated interaction coefficients with dimensions corresponding to features $\times$ features $\times$ iterations.
final_interaction_matrix: A two-dimensional matrix of interaction coefficients obtained by taking the median over the iterations.

Arguments

Data: A matrix or data frame of the transformed species abundance data.
metadata: A data frame. Containing information about all samples, including at least the grouping of all samples as well as individual information (Group and ID), the sampling Time point for each sample, and other relevant information.
Group_var: A character string specifying the column name in metadata that defines the groups for analysis.
abund_centered_method: A character string indicating the method to compute species equilibrium abundance. Accepted values are median (default) and mean.
num_iterations: An integer specifying the number of bagging iterations for the iterative variable selection process. Default is 10.
error_threshold: A numeric value representing the relative error improvement threshold for adding new predictors during bagging iteration. Default is 1e-3.
pre_error: A numeric value specifying the initial (large) error used for comparison in the iterative procedure. Default is 10000.
seed: Random seed, default by NULL.

Author

Shijia Li

Details

This function implements the discrete-time Lotka-Volterra model to characterize species interactions in microbiome time-series data. The model describes the abundance (MCLR transformed) $x_{ni}$ of species $i$ for subject $n$ at time $t+\Delta t$ as: $$x_{ni} (t+\Delta t) = \eta_{ni} (t) x_{ni} (t) \exp\left(\Delta t \sum_j c_{nij} (x_{nj} (t) - <x_{nj}>) \right)$$ where $<x_{nj}>$ represents the equilibrium abundance of species $j$, typically defined as the median abundance across samples from the same subject; $c_{nij}$ denotes the interaction coefficient of species $j$ on species $i$; and $\eta_{ni} (t)$ accounts for log-normally distributed stochastic effects. For computational simplicity, stochastic effects are ignored, $\Delta t$ is set to 1. Taking the natural logarithm yealds: $$\ln x_{ni} (t+1) - \ln x_{ni} (t) = \sum_j c_{nij} (x_{nj} (t) - <x_{nj}>)$$ To improve sparsity and interpretability, the LIMITS algorithm is applied, incorporating stepwise regression and bagging. First, 50% of the samples are randomly selected as the training set while the rest serve as the test set. An initial regression model includes only the self-interaction term: $$\ln x_{ni} (t+1) - \ln x_{ni} (t) = c_{nii} (x_{ni} (t) - <x_{ni}>)$$ Stepwise regression then iteratively adds species interaction terms from a candidate set $S$, forming: $$\ln x_{ni} (t+1) - \ln x_{ni} (t) = c_{nii} (x_{ni} (t) - <x_{ni}>) + \sum_{j \in S} c_{nij} (x_{nj} (t) - <x_{nj}>)$$ The inclusion of a new term is determined based on the improvement in mean squared error (MSE) on the test set: $$\theta = \frac{\text{MSE}_{\text{before}} - \text{MSE}_{\text{after}}}{\text{MSE}_{\text{before}}}$$ If $\theta$ exceeds a predefined threshold (default $10^{-3}$), the species is included. Bagging is performed over $B$ iterations by repeating the random splitting and stepwise regression, enhancing robustness. The final interaction coefficient matrix is computed as: $$c_{nij} = \text{median}(c_{nij}^{(1)}, c_{nij}^{(2)}, ..., c_{nij}^{(B)})$$ This approach refines the inferred species interactions while ensuring sparsity.

Examples

Run this code

# \donttest{
# Example usage:
set.seed(123)
Data <- matrix(sample(1:100, 50, replace = TRUE), nrow = 5)
rownames(Data) <- paste0("Feature", 1:5)
colnames(Data) <- paste0("Sample", 1:10)

# Create example metadata with a grouping variable
metadata <- data.frame(Group = rep(c("A", "B"), each = 5))
rownames(metadata) <- paste0("Sample", 1:10)
metadata$Time = rep(c(1,2,3,4,5),2)
metadata$ID = paste("ID",seq(1:10),"")

results <- Spec.interact(Data = as.data.frame(t(Data)),
                         metadata = metadata,
                         Group_var = "Group",
                         abund_centered_method = "median",
                         num_iterations = 5,
                         error_threshold = 1e-3,
                         pre_error = 10000)
# }

Run the code above in your browser using DataLab