estimate_virus_parameters_PT: Estimate PT virus parameters using Bayesian inference with rstan

Description

Runs MCMC sampling using a precompiled Stan model for PT plant virus. Analyses PT access period data estimating 5 parameters: (alpha[1], beta[1], gamma[1],mu[1],bd[1] - acquisition, inoculation, latent progression, insect recovery and insect lab mortality rates, respectively).

Usage

estimate_virus_parameters_PT(
  data,
  D_LSin = 50,
  D_numPtsPdin = 1,
  mcmcOptions = c(500, 1000),
  numChainsIn = 4,
  mc.parallel = 0
)

Value

A list with seven elements:

array0: MCMC chains for estimated parameters (alpha[1], beta[1], gamma[1]/mu[1],bd[1] - and additionally lp__ for reference only).
array1: MCMC chains for estimated parameters (alpha[1], beta[1], gamma[1],mu[1]).
array2: summary_stats (rhat, ess_bulk, ess_tail). [1]: R-hat statistic for convergence (should be close to 1); [2-3]: statistics for n-eff.
array3: Validation dataset: AAP sub-assay input test plant data, forward simulated 2.5th percentile, forward simulated 97.5th percentile.
array4: Validation dataset: LAP sub-assay input test plant data, forward simulated 2.5th percentile, forward simulated 97.5th percentile.
array5: Validation dataset: IAP sub-assay input test plant data, forward simulated 2.5th percentile, forward simulated 97.5th percentile.
array6: Mean and sd Bayesian R-squared values for model fit assessment, for AAP, LAP and IAP assays.
converge_results: A list containing 2 sampler diagnostics, the number of overall divergent transitions and if maximum tree depth has been exceeded. See also screen print for acceptability of energy Bayesian fraction of missing information (E-BFMI).

Arguments

data: A list containing the AP experiment data for Stan (required).
D_LSin: Upper bound on insect vector lifespan in days sets the vector survival discretization (optional, default = 50).
D_numPtsPdin: Number of points per day of insect vector sets the vector survival discretization (optional, default = 1).
mcmcOptions: A numeric vector of length 2: The first element specifies the number of warm-up iterations (optional, default = 500), and the second element specifies the total number of iterations (optional, default = 1000).
numChainsIn: Numeric: number of Markov chains (optional, default = 4).
mc.parallel: Binary: whether to use parallelisation for sampling (optional, default = 0, i.e. 1 core only).

Details

Interpreting model output - check stability first

Warnings will be printed to the screen if there are issues with model fitting. Do not suppress warnings (e.g., via suppressWarnings()), as they contain essential information about potential convergence problems.

Before interpreting any model results, always check the following core diagnostics:

R-hat - Should be close to 1 for all parameters; larger values indicate non-convergence (Stan R-hat documentation).
Effective Sample Size (n_eff) - Very low values suggest high autocorrelation or insufficient sampling (Stan ESS documentation).
Divergent transitions - Count should be 0; any non-zero count requires investigation (Stan divergences documentation).
Forward Simulation from Fitted Model (simulated_data) - Presented conveniently for plotting to assess posterior predictive fit.
- array3: AAP forward simulation 95% credible intervals and original data
- array4: LAP forward simulation 95% credible intervals and original data
- array5: IAP forward simulation 95% credible intervals and original data (Stan posterior predictive checks).

If any of these core diagnostics fail, the model fit may not be trustworthy.

Additional diagnostics for troubleshooting

These are useful when core checks indicate problems, or for deeper exploration of model behaviour:

Bayesian R^2 (r2_bayesA, r2_bayesL, r2_bayesI) - Measures explanatory power; low values suggest poor fit to the data (i.e., 3 values for AAP assay, LAP assay,IAP assay).
Max treedepth exceeded - Number of times the sampler hit the maximum tree depth; should be 0 (Stan max treedepth documentation).
EBFMI (Energy Bayesian Fraction of Missing Information) - Low values indicate poor exploration of the posterior.

See the package vignette for worked examples of checking and interpreting these diagnostics.

Preprint reference

The models implemented in this function follow Donnelly et al. (2025, preprint), originally implemented in the EpiPv GitHub package.

References

Donnelly, R., Tankam, I. & Gilligan, C. (2025). "Plant pathogen profiling with the EpiPv package." EcoEvoRxiv, tools:::Rd_expr_doi("10.32942/X29K9P").

When available, please cite the published version.

Examples

Run this code

data("ap_data_sim_PT", package = "EpiPvr")
# run low warm-up and iterations (mcmcOptions) for quick example only
EVPT_pub=estimate_virus_parameters_PT(
  data=ap_data_sim_PT,D_LSin=5,D_numPtsPdin=1,
  mcmcOptions=c(25,50),numChainsIn=1,mc.parallel=0
)

Run the code above in your browser using DataLab