estimate_virus_parameters_SPT: Estimate SPT virus parameters using Bayesian inference with rstan

Description

Runs MCMC sampling using a precompiled Stan model for SPT plant virus. Analyses SPT access period data directly estimating 4 parameters: (c1[1], c3[1], c2[1],bD[1], the first 3 are composite and the 4th corresponds to insect lab mortality rate). The composite parameters can be unpacked into al[1], be[1], mu[1] - acquisition, inoculation, and insect recovery rates.

Usage

estimate_virus_parameters_SPT(
  data,
  D_LSin = 50,
  D_numPtsPdin = 1,
  mcmcOptions = c(500, 1000),
  numChainsIn = 4,
  mc.parallel = 0
)

Value

A list with seven elements:

array0: First set of MCMC chains for estimated parameters (c1[1], c3[1], c2[1],bD[1] - and additionally lp__ for reference only).
array1: Second set of MCMC chains for virus parameters (al[1], be[1], mu[1]).
array2: summary_stats (rhat, ess_bulk, ess_tail). [1]: R-hat statistic for convergence (should be close to 1); [2-3]: statistics for n-eff.
array3: Validation list, AAP input test plant infection data, forward simulated 2.5th percentile, forward simulated 97.5th percentile.
array4: Validation list, IAP input test plant infection data, forward simulated 2.5th percentile, forward simulated 97.5th percentile.
array5: Mean and sd Bayesian R-squared values for model fit assessment, for AAP and IAP assays.
converge_results: A list containing key sampler diagnostics: divergent transitions, maximum tree depth exceedances, See also screen print for acceptability of energy Bayesian fraction of missing information (E-BFMI).).

Arguments

data: A list containing the AP experiment data for Stan (required).
D_LSin: Upper bound on insect vector lifespan in days, sets the vector survival discretisation (optional, default = 50).
D_numPtsPdin: Number of points per day of insect vector lifespan, sets the vector survival discretisation (optional, default = 1).
mcmcOptions: A numeric vector of length 2: The first element specifies the number of warm-up iterations (optional, default = 500), and the second element specifies the number of post-warm-up iterations (optional, default = 1000).
numChainsIn: Numeric: number of Markov chains (optional, default = 4).
mc.parallel: Binary: whether to use parallelisation for sampling (optional, default = 0, i.e. 1 core only).

Details

Interpreting model output - check stability first

Warnings will be printed to the screen if there are issues with model fitting. Do not suppress warnings (e.g., via suppressWarnings()), as they contain essential information about potential convergence problems.

Before interpreting any model results, always check the following core diagnostics:

R-hat - Should be close to 1 for all parameters; larger values indicate non-convergence (Stan R-hat documentation).
Effective Sample Size (n_eff) - Very low values suggest high autocorrelation or insufficient sampling (Stan ESS documentation).
Divergent transitions - Count should be 0; any non-zero count requires investigation (Stan divergences documentation).
Posterior predictive fit (simulated_data) - Compare model-simulated data with the observed data:
- array3: AAP forward simulation 95% credible intervals and original data
- array4: IAP forward simulation 95% credible intervals and original data (Stan posterior predictive checks).

Additional diagnostics for troubleshooting

These are useful when core checks indicate problems, or for deeper exploration of model behaviour:

Bayesian R^2 (r2_bayesA, r2_bayesI) - Measures explanatory power; low values suggest poor fit to the data (i.e., 2 values for AAP assay, IAP assay).
Max treedepth exceeded - Number of times the sampler hit the maximum tree depth; should be 0 (Stan max treedepth documentation).
EBFMI (Energy Bayesian Fraction of Missing Information) - Low values indicate poor exploration of the posterior.

See the package vignette for worked examples of checking and interpreting these diagnostics.

Preprint reference

The models implemented in this function follow Donnelly et al. (2025, preprint), originally implemented in the EpiPv GitHub package.

References

Donnelly, R., Tankam, I. & Gilligan, C. (2025). "Plant pathogen profiling with the EpiPv package." EcoEvoRxiv, tools:::Rd_expr_doi("10.32942/X29K9P").

When available, please cite the published version.

Examples

Run this code

data("ap_data_sim_SPT", package = "EpiPvr")
# run low warm-up and iterations (mcmcOptions) for quick example only
EVSPT_pub=estimate_virus_parameters_SPT(
  data=ap_data_sim_SPT,D_LSin=5,D_numPtsPdin=1,
  mcmcOptions=c(25,50),numChainsIn=1,mc.parallel=0
)

Run the code above in your browser using DataLab