Data.interpolate: Interpolate Time-Series Data Based on Sample Time

Description

This function performs interpolation on a data frame or matrix (e.g., OTU/ASV counts or other time-series measurements) using corresponding metadata time points. For each unique subject (as defined by a subject ID), the function constructs a full time series between the minimum and maximum time points and applies interpolation (defaulting to cubic interpolation) to generate data for missing time points. The function returns both the interpolated time-series data and the associated updated metadata.

Usage

Data.interpolate(
  Data,
  metadata,
  Sample_Time,
  Sample_ID,
  interp_method = "cubic",
  Group_var
)

Value

An object of class "MicrobTiSDA.interpolate" containing:

Interpolated_Data: A data frame of interpolated abundance data.
Interpolated_Data_metadata: A data frame of corresponding interpolated metadata.

Arguments

Data: A data frame where rows represent OTUs/ASVs and columns represent samples Or the output of the function Data.filter.
metadata: A data frame. Containing information about all samples, including at least the grouping of all samples as well as individual information (Group and ID), the sampling Time point for each sample, and other relevant information.
Sample_Time: A character string specifying the column name in metadata that contains time information.
Sample_ID: A character string specifying the column name in metadata that identifies unique samples of each subject.
interp_method: A character string specifying the interpolation method to be used by interp1. Default is 'cubic'. Other methods accepted by interp1 (e.g., 'linear') can also be used.
Group_var: A character string specifying the column name in metadata that indicates group membership.

Author

Shijia Li

Details

This function processes the input data and metadata by interating over each unique subject ID defined in Sample_ID. For each subject, it subsets and sorts the metadata by Sample_Time and constructs a complete time series from the minimum to maximum time values with a step of 1. It then extracts the corresponding data columns and performs interpolation (Using the specified interp_method, with cubic as the default) on each feature across the full time series. Simultaneously, updated metadata is generated for the interpolated time points, preserving the subject ID and group information as indicated by Group_var. The function returns a list object containing the interpolated data matrix and the corresponding updated metadata.

Examples

Run this code

# \donttest{
# Example data: 5 features across 8 samples with time points from two subjects.
set.seed(123)
Data <- matrix(sample(1:100, 40, replace = TRUE), nrow = 5)
rownames(Data) <- paste0("Feature", 1:5)
colnames(Data) <- paste0("Sample", 1:8)

# Create metadata with time points, sample IDs, and group assignments.
metadata <- data.frame(
  Time = c(1, 3, 5, 7, 2, 4, 6, 8),
  ID = c(rep("Subject1", 4), rep("Subject2", 4)),
  Group = c(rep("A", 4), rep("B", 4)),
  row.names = paste0("Sample", 1:8)
)

# Interpolate the data using cubic interpolation.
interp_results <- Data.interpolate(Data = Data,
                                   metadata = metadata,
                                   Sample_Time = "Time",
                                   Sample_ID = "ID",
                                   interp_method = "cubic",
                                   Group_var = "Group")
# }

Run the code above in your browser using DataLab