Learn R Programming

EpiPvr (version 0.0.1)

estimate_virus_parameters_SPT: Estimate SPT virus parameters using Bayesian inference with rstan

Description

Runs MCMC sampling using a precompiled Stan model for SPT plant virus. Analyses SPT access period data directly estimating 4 parameters: (c1[1], c3[1], c2[1],bD[1], the first 3 are composite and the 4th corresponds to insect lab mortality rate). The composite parameters can be unpacked into al[1], be[1], mu[1] - acquisition, inoculation, and insect recovery rates.

Usage

estimate_virus_parameters_SPT(
  data,
  D_LSin = 50,
  D_numPtsPdin = 1,
  mcmcOptions = c(500, 1000),
  numChainsIn = 4,
  mc.parallel = 0
)

Value

A list with seven elements:

array0

First set of MCMC chains for estimated parameters (c1[1], c3[1], c2[1],bD[1] - and additionally lp__ for reference only).

array1

Second set of MCMC chains for virus parameters (al[1], be[1], mu[1]).

array2

summary_stats (rhat, ess_bulk, ess_tail). [1]: R-hat statistic for convergence (should be close to 1); [2-3]: statistics for n-eff.

array3

Validation list, AAP input test plant infection data, forward simulated 2.5th percentile, forward simulated 97.5th percentile.

array4

Validation list, IAP input test plant infection data, forward simulated 2.5th percentile, forward simulated 97.5th percentile.

array5

Mean and sd Bayesian R-squared values for model fit assessment, for AAP and IAP assays.

converge_results

A list containing key sampler diagnostics: divergent transitions, maximum tree depth exceedances, See also screen print for acceptability of energy Bayesian fraction of missing information (E-BFMI).).

Arguments

data

A list containing the AP experiment data for Stan (required).

D_LSin

Upper bound on insect vector lifespan in days, sets the vector survival discretisation (optional, default = 50).

D_numPtsPdin

Number of points per day of insect vector lifespan, sets the vector survival discretisation (optional, default = 1).

mcmcOptions

A numeric vector of length 2: The first element specifies the number of warm-up iterations (optional, default = 500), and the second element specifies the number of post-warm-up iterations (optional, default = 1000).

numChainsIn

Numeric: number of Markov chains (optional, default = 4).

mc.parallel

Binary: whether to use parallelisation for sampling (optional, default = 0, i.e. 1 core only).

Details

Interpreting model output - check stability first

Warnings will be printed to the screen if there are issues with model fitting. Do not suppress warnings (e.g., via suppressWarnings()), as they contain essential information about potential convergence problems.

Before interpreting any model results, always check the following core diagnostics:

  1. R-hat - Should be close to 1 for all parameters; larger values indicate non-convergence (Stan R-hat documentation).

  2. Effective Sample Size (n_eff) - Very low values suggest high autocorrelation or insufficient sampling (Stan ESS documentation).

  3. Divergent transitions - Count should be 0; any non-zero count requires investigation (Stan divergences documentation).

  4. Posterior predictive fit (simulated_data) - Compare model-simulated data with the observed data:

    • array3: AAP forward simulation 95% credible intervals and original data

    • array4: IAP forward simulation 95% credible intervals and original data (Stan posterior predictive checks).

Additional diagnostics for troubleshooting

These are useful when core checks indicate problems, or for deeper exploration of model behaviour:

  • Bayesian R^2 (r2_bayesA, r2_bayesI) - Measures explanatory power; low values suggest poor fit to the data (i.e., 2 values for AAP assay, IAP assay).

  • Max treedepth exceeded - Number of times the sampler hit the maximum tree depth; should be 0 (Stan max treedepth documentation).

  • EBFMI (Energy Bayesian Fraction of Missing Information) - Low values indicate poor exploration of the posterior.

See the package vignette for worked examples of checking and interpreting these diagnostics.

Preprint reference

The models implemented in this function follow Donnelly et al. (2025, preprint), originally implemented in the EpiPv GitHub package.

References

Donnelly, R., Tankam, I. & Gilligan, C. (2025). "Plant pathogen profiling with the EpiPv package." EcoEvoRxiv, tools:::Rd_expr_doi("10.32942/X29K9P").

When available, please cite the published version.

Examples

Run this code
data("ap_data_sim_SPT", package = "EpiPvr")
# run low warm-up and iterations (mcmcOptions) for quick example only
EVSPT_pub=estimate_virus_parameters_SPT(
  data=ap_data_sim_SPT,D_LSin=5,D_numPtsPdin=1,
  mcmcOptions=c(25,50),numChainsIn=1,mc.parallel=0
)

Run the code above in your browser using DataLab