Reg.MESR: Fit Mixed-Effects Spline Regression Models with GAM for Microbial Features on Group-Level

Description

This function fits mixed-effect spline regression models using GAM to capture nonlinear temporal trends in microbial community data at the group level. It incorporates random effects for individual IDs within each group and uses natural splines to model the relationship between time and microbial feature abundances.

Usage

Reg.MESR(
  Data_for_Reg,
  pre_processed_data,
  unique_values = 5,
  z_score = NA,
  Knots = NULL,
  max_Knots = 3,
  seed = 123
)

Value

An object of class RegMESR with two elements: fitted_model is a nested list containing the fitted mixed-effects GAM models for each design dummy variable and OTU; knots_info_each_model is a corresponding nested list with the knots used for each model.

Arguments

Data_for_Reg: The data frame output from the Design.
pre_processed_data: The transformed data output from the Data.trans function. A pre-processed OTU data frame with sample IDs as row names and OTU IDs as column names.
unique_values: An integer specifying the minimum number of unique values required in an OTU for spline fitting (default: 5).
z_score: A numeric value specifying the z-score threshold for outlier filtering; if NA, no outlier filtering is performed (default: NA).
Knots: An optional numeric vector specifying the knots to use in the spline regression. If NULL, the optimal number of knots is determined by minimizing the GCV criterion (default: NULL).
max_Knots: An integer indicating the maximum number of knots to consider when selecting the optimal spline model (default: 3).
seed: Random seed.

Author

Shijia Li

Details

This function fits mixed-effects spline regression models. Unlike Reg.SPLR, this function captures the overall temporal dynamics of microbial features across different groups. Assuming the dataset contains $\textit{m}$ groups, each with $\textit{n}$ subjects, the model is formulated as: $$x_{(mi)}(t) = \beta_{m0} + \sum_{k=1}^{K} \beta_{mk} \cdot N_{mk}(t) + b_{n(m)} + \epsilon$$ where $x_{(mi)}(t)$ represents the abundance of microbial feature $\textit{i}$ at time point $\textit{t}$ in group $\textit{m}$. The random effect $b_{n(m)}$ reflects the departure of individual $\textit{n}$ in group $\textit{m}$ from overall population average effects. The parameter $K$ refers to the number of basis functions, which is equal to the number of knots plus one. (i.e., $K = number of knots + 1$).

The Reg.MESR function first extracts independent variables from a design matrix (produced by the Design) by removing columns corresponding to the pre-processed OTU data. For each design dummy variable (excluding the first and last columns), the function subsets the data to include only the observations where the dummy variable equals 1. Then, for each OTU in the pre-processed data, it optionally filters out outliers based on a specified z-score threshold. If the number of unique transformed OTU values exceeds a given threshold (unique_values), a mixed-effects spline regression model is fitted using a natural spline on the time variable along with a random effect for the sample ID. When the Knots parameter is not provided, the function iterates over a range of knot numbers (from 1 to max_Knots), selects the optimal model by minimizing the Generalized Cross-Validation (GCV) criterion, and extracts the corresponding knot locations. Alternatively, if Knots is provided, it is directly used in model fitting. The resulting fitted models and associated knots information are organized into nested lists and returned.

Examples

Run this code

# \donttest{
metadata <- data.frame(
  TimePoint = c(1, 2, 3, 4),
  Sample = c('S1', 'S2', 'S3', 'S4'),
  GroupA = c('A', 'A', 'B', 'B'),
  GroupB = c('X', 'Y', 'X', 'Y')
)

# Example pre-processed data (e.g., transformed abundance data)
Pre_processed_Data <- data.frame(
  Feature1 = rnorm(4),
  Feature2 = rnorm(4)
)

# Create design matrix using grouping variables
design_data <- Design(metadata, Group_var = c('GroupA', 'GroupB'), Pre_processed_Data,
                      Sample_Time = 'TimePoint', Sample_ID = 'Sample')

fit_result <- Reg.MESR(Data_for_Reg = design_data,
                       pre_processed_data = Pre_processed_Data,
                       unique_values = 5,
                       z_score = 2,
                       Knots = NULL,
                       max_Knots = 5)
# }

Run the code above in your browser using DataLab