A discrepancy statistic for the fitted model is obtained using a
posterior predictive checking procedure.
The following statistics are currently available:
- Freeman-Tukey statistics (default)
\(T_{\textrm{FT}} = \sum_{i,j,k}\left(\sqrt{y_{ijk}} - \sqrt{E(y_{ijk} \mid \pi_{ijk})}\right)^2\)
- Deviance statistics
\(T_{\textrm{deviance}} = -2 \sum_{j,k} \log \textrm{Multinomial}(\boldsymbol{y}_{jk} \mid \boldsymbol{\pi}_{jk})\)
- Chi-squared statistics
\(T_{\chi^2} = \sum_{i,j,k}\frac{\left(y_{ijk} - E(y_{ijk} \mid \pi_{ijk})\right)^2}{E(y_{ijk} \mid \pi_{ijk})}\)
where \(i\), \(j\), and \(k\) are the subscripts of species, site, and
replicate, respectively,
\(y_{ijk}\) is sequence read count data,
\(\pi_{ijk}\) is multinomial cell probabilities of sequence read
counts,
\(E(y_{ijk} \mid \pi_{ijk})\) is expected
value of the sequence read counts conditional on their cell probabilities,
and \(\log \textrm{Multinomial}(\boldsymbol{y}_{jk} \mid \boldsymbol{\pi}_{jk})\)
is the multinomial log-likelihood of the sequence read counts in replicate
\(k\) of site \(j\) conditional on their cell probabilities.
The Bayesian \(p\)-value is estimated as the probability that the value of
the discrepancy statistics of the replicated dataset is more extreme than that
of the observed dataset.
An extreme Bayesian \(p\)-value may indicate inadequate model fit.
See Gelman et al. (2014), Kéry and Royle (2016), and
Conn et al. (2018) for further details on the procedures used for posterior
predictive checking.
Computations can be run in parallel on multiple CPU cores where the cores
argument controls the degree of parallelization.