Learn R Programming

santaR

Interactive package for Short AsyNchronous Time-series Analysis (SANTA), implemented in R and Shiny

Overview

santaR is an R package that implements functions for analysis of short asynchronous time-series analysis.

santaR can deal with challenges not simultaneously addressed by current time-series statistical methods:

  • missing observations
  • asynchronous sampling
  • measurement error
  • low number of time points (e.g. 4 to 10)
  • high number of variables
  • biological variability
  • nonlinearity

The reference versions of santaR is available on CRAN. Active development and issue tracking take place on the github page, while an overview of the package, vignettes and documentation are available on the supporting website.

To address the challenges of time-series in Systems Biology, santaR (Short AsyNchronous Time-series Analysis) provides a Functional Data Analysis (FDA) approach -where the fundamental units of analysis are curves representing each individual across time-, in a graphical and automated pipeline for robust analysis of short time-series studies.

Analytes levels are descriptive of the underlying biological state and evolve smoothly through time. For a single analyte, the time trajectory of each individual is described with a smooth curve estimated by smoothing splines. For a group of individuals, a curve representing the group mean trajectory is also calculated. These individual and group mean curves become the new observational unit for subsequent data analysis, that is, the estimation of the intra-class variability and the identification of trajectories significantly altered between groups.

Designed initially for metabolomic, santaR is also suited for other Systems Biology disciplines. Implemented in R and Shiny, santaR is developed as a complete and easy-to-use statistical software package, which enables command line and GUI analysis, with fast and parallel automated analysis and reporting. Comprehensive plotting options as well as automated summaries allow clear identification of significantly altered analytes for non-specialist users.

Installation

Install the CRAN release of santaR with:

install.packages("santaR")

The development version can be obtained from GitHub:

# Install devtools
if(!require("devtools")) install.packages("devtools")
devtools::install_github("adwolfer/santaR", ref="master")

If the dependency pcaMethods is not successfully installed, it can be installed from Bioconductor:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("pcaMethods")

Usage

To get started santaR’s graphical user interface implements all the functions for short asynchronous time-series analysis:

library(santaR)

santaR_start_GUI(browser = TRUE)
#  To exit press ESC in the command line

The graphical user interface is divided in 4 sections, corresponding to the main steps of analysis:

Import, DF search, Analysis and Export:

  • The Import tab manages input data in comma separated value (csv) format or as an RData file containing a previous analysis. Once data is imported the DF search and Analysis tabs become available.
  • DF search implements the tools for the selection of an optimal number of degrees of freedom (df).
  • With the data imported and a pertinent df selected, Analysis regroups the interface to visualise and identify variables significantly altered over time. A plotting interface enables the interactive visualisation of the raw data points, individual trajectories, group mean curves and confidence bands for all variables, which subsequently can be saved. Finally, if inter-group differential trajectories have been characterised, all significance testing results (with correction for multiple testing) are presented in interactive tables.
  • The Export tab manages the saving of results and automated reporting. Fitted data can be saved as an RData file for future analysis or reproduction of results. csv tables containing significance testing results can also be generated and summary plot for each significantly altered variable saved for rapid evaluation.

Vignettes and Demo data

More information is available in the graphical user interface as well as in the following vignettes:

A dataset containing the concentrations of 22 mediators of inflammation over an episode of acute inflammation is also available. The mediators have been measured at 7 time-points on 8 subjects, concentration values have been unit-variance scaled for each variable. A subset of the data is presented below:

## Metadata
acuteInflammation$meta
timeindgroup
4ind_6Group2
4ind_7Group1
4ind_8Group2
8ind_1Group1
8ind_2Group2
8ind_3Group1
## Data
acuteInflammation$data
var_1var_2var_3var_4
2.6682.4641.3651.743
-0.30020.053660.45090.01572
3.7772.5431.8582.213
-0.32750.15640.5850.03299
0.7080.4893-0.082190.9345
-0.4101-0.03727-0.2914-0.7239

Other tips

The GUI is to be prefered to understand the methodology, select the best parameters on a subset of the data before running the command line, or to visually explore results.

If a very high number of variables is to be processed, santaR’s command line functions are more efficient, as they can be integrated in scripts and the reporting automated.

Copyright

santaR is licensed under the GPLv3

As a summary, the GPLv3 license requires attribution, inclusion of copyright and license information, disclosure of source code and changes. Derivative work must be available under the same terms.

© Arnaud Wolfer (2024)

Copy Link

Version

Install

install.packages('santaR')

Monthly Downloads

260

Version

1.2.4

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Arnaud Wolfer

Last Published

March 7th, 2024

Functions in santaR (1.2.4)

plot_nbTP_histogram

Plot an histogram of the number of time-trajectories with a given number of time-points
santaR_auto_fit

Automate all steps of santaR fitting, Confidence bands estimation and p-values calculation for one or multiple variables
santaR_fit

Generate a SANTAObj for a variable
santaR_CBand

Compute Group Mean Curve Confidence Bands
santaR

santaR: A package for Short AsyNchronous Time-series Analysis in R
santaR_plot

Plot a SANTAObj
plot_param_evolution

Plot the evolution of different fitting parameters across all possible df for each eigenSpline
get_param_evolution

Compute the value of different fitting metrics over all possible df for each eigenSpline
loglik_smooth_spline

Calculate the penalised loglikelihood of a smooth.spline
scaling_mean

Mean scaling of each column
santaR_pvalue_dist

Evaluate difference in group trajectories based on the comparison of distance between group mean curves
santaR_pvalue_dist_within

Evaluate difference between a group mean curve and a constant model
santaR_auto_summary

Summarise, report and save the results of a santaR analysis
santaR_pvalue_fit_within

Evaluate difference between a group mean curve and a constant model using the comparison of model fit (F-test)
santaR_pvalue_fit

Evaluate difference in group trajectories based on the comparison of model fit (F-test) between one and two groups
santaR_start_GUI

santaR Graphical User Interface
scaling_UV

Unit-Variance scaling of each column
BIC_smooth_spline

Calculate the Bayesian Information Criterion for a smooth.spline
acuteInflammation

Measurement of 22 inflammatory mediators across time
AICc_smooth_spline

Calculate the Akaike Information Criterion Corrected for small observation numbers for a smooth.spline
get_eigen_spline_matrix

Generate a Ind x Time + Var data.frame concatenating all variables from input variable
get_ind_time_matrix

Generate a Ind x Time DataFrame from input data
get_eigen_DFoverlay_list

Plot for each eigenSpline the automatically fitted spline, splines for all df and a spline at a chosen df
AIC_smooth_spline

Calculate the Akaike Information Criterion for a smooth.spline
get_grouping

Generate a matrix of group membership for all individuals
get_eigen_DF

Compute the optimal df and weighted-df using 5 spline fitting metric
get_eigen_spline

Compute eigenSplines across a dataset