This document explains the models used in the PandemicLP package in some detail.
The count data for number of cases or deaths is modeled according to an epidemiological model of growth. In particular, the average counts are \(\mu(t)\) modeled with a generalized logistic curve: $$\mu(t) = a c f \frac{e^{-c t}}{(b+e^{-c t})^{f+1}}.$$ All parameters, that is \(a, b, c\) and \(f\) are positive.
Parameter \(c\) is interpreted as the infection rate. Parameter \(f\) controls the asymmetry, so if it is equal to 1, then the curve is symmetric. If it is lesser than 1, then the cases grow slower before the peak than they decrease after. The behavior is inverted when \(f\) is greater than 1.
The counts for the Covid-19 pandemic typically had a behavior with positive asymmetry, and so the default for the package functions is to use a greater than 1 truncation for \(f\).
It was common in the early stages of the Covid-19 pandemic that the predictions would result in very high and absurd values for the total number of cases (TNC). It is straightforward to show that $$TNC = \frac{a}{b^f}.$$ Since all locations displayed a total number of cases that never exceeded 5% of that location's population, another truncation is applied, so that \(a\le b^f 0.08 Pop\), where \(Pop\) is the location's population. This is the reason why the model requires the region's population in order to run the model estimation.
The simplest probabilistic model for the counts is the Poisson model. If \(y_t\) is the count at time \(t\), then $$y_t | \theta \sim Poisson(\mu(t)),$$ where \(\theta\) represents the model parameters.
Here we present some other forms for the growth curve in the mean. The
respective parameters can be adjusted in the pandemic_model
function.
A weekly seasonal effect can be added. This is done by multiplying \(\mu(t)\) by a positive effect \(d\) when \(t\) is the desired weekday. If \(d < 1\) then that weekday represents under-reporting. It is over-reporting if \(d > 1\). Currently, only weekdays are accepted as seasonal effects.
Additionally, two or more curves can be fitted, as happened in the Covid-19 pandemic in many locations. In this case the model is slightly different. In this case, $$\mu(t) = \mu_1(t)+...+\mu_K(t)$$ $$\mu_j(t) = a_j c_j \frac{e^{-c_j t}}{(b_j+e^{-c_j t})^2}\Phi(\alpha_j (t-\delta_j)), j = 1, ..., K,$$ where \(\Phi(.)\) is the probit function. The probit function induces asymmetry in the curve, similarly to parameter \(f\), which is thus excluded in this case.
In addition to the Poisson family, it is possible to fit a Negative Binomial model. The model is parameterized so that the overdispersion does not depend on the mean. This particular parameterization has shown best results when combined with the multiple waves and seasonal effects described above. The model is $$y_t | \lambda_t \sim Poisson(\lambda_t)$$ $$\lambda_t | \theta \sim Gamma(\phi \mu(t), \phi)$$.
Apart from the truncation mentioned above, the prior is defined as
independent priors, detailed below. The format is as follows.
\(p\sim D(h1, h2): def1, def2\), where \(p\) is the parameter, \(D\)
is the distribution family, \(h1\) and \(h2\) are the hyperparameter
encoding such that they can be changed in the prior_parameters
argument of the pandemic_model function. Finally, \(def1\)
and \(def2\) are the default values if they are not changed by the user.
Note that every available model used in the pandemic_model
function uses only a subset of these parameters. The parameterization of the
distributions is such that the values are passed directly to the stan
code.
$$a_j\sim Gamma(a_alpha, a_beta), j = 1, ..., K: 0.1, 0.1$$
$$b_j\sim LogNormal(mu_{b_1}, sigma2_{b_1}), j = 1, ..., K: 0, 20$$
$$c_j\sim Gamma(c_alpha, c_beta), j = 1, ..., K: 2, 9$$
$$f\sim Gamma(f_alpha, f_beta): 0.01, 0.01$$
$$d_j\sim Gamma(d_{j_alpha}, d_{j_beta}), j = 1, 2, 3: 2, 1$$
$$\delta_j\sim Normal(mu_delta, sigma2_delta), j = 1, ..., K: 0, 100$$
$$\alpha_j\sim Gamma(alpha_alpha, alpha_beta), j = 1, ..., K: 0.01, 0.01$$
$$\phi\sim Gamma(phi_alpha, phi_beta): 0.1, 0.1$$
Note that the prior for waves parameters are the same for all waves. However,
it is possible to use a specific prior for each seasonal effect. For example,
if the user wants to change the mu_b_1 and d_2_beta for a model
with at least two seasonal effects, they would include the argument
prior_parameters = list(mu_b_1 = 1, d_2_beta = 0.001) in the
pandemic_model function.
Four arguments in the function change the fitted model, as described below:
'seasonal_effect': By leaving this argument NULL, the standard
model is fitted. By supplying it with a vector of up to three weekdays, the
desired seasonal effects are added to the model.
'n_waves': By leaving this argument equal to 1, the standard model is fitted. By changing it to 2 or more implies a multiple waves model.
'family': The standard model is fitted with the default value of "poison". When changed to "negbin", the negative binomialm model is used.
'prior_parameters': If left as NULL, the default prior values
are used. By setting a list with any objects as described above,
the provided values will be used.
Dani Gamerman, Marcos O. Prates, Thais Paiva and Vinicius D. Mayrink (2021). Building a Platform for Data-Driven Pandemic Prediction: From Data Modelling to Visualisation - The CovidLP Project. CRC Press