Learn R Programming

⚠️There's a newer version (1.1.51) of this package.Take me there.

mvgam

The goal of mvgam is to fit Bayesian Dynamic Generalized Additive Models to time series data. The motivation for the package is described in Clark & Wells 2022 (published in Methods in Ecology and Evolution), with additional inspiration on the use of Bayesian probabilistic modelling coming from Michael Betancourt, Michael Dietze and Emily Fox, among many others.

Resources

A series of vignettes cover data formatting, forecasting and several extended case studies of DGAMs. A number of other examples have also been compiled:

  • Ecological Forecasting with Dynamic Generalized Additive Models
  • Distributed lags (and hierarchical distributed lags) using mgcv and mvgam
  • Ecological Forecasting with Dynamic GAMs; a tutorial and detailed case study
  • How to interpret and report nonlinear effects from Generalized Additive Models
  • Phylogenetic smoothing using mgcv
  • Incorporating time-varying seasonality in forecast models

Installation

Install the stable version from CRAN using: install.packages('mvgam'), or install the development version from GitHub using: devtools::install_github("nicholasjclark/mvgam"). Note that to condition models on observed data, either JAGS (along with packages rjags and runjags) or Stan must be installed (along with either rstan and/or cmdstanr). Please refer to installation links for JAGS here, for Stan with rstan here, or for Stan with cmdstandr here. You will need a fairly recent version of Stan to ensure all syntax is recognized. If you see warnings such as variable "array" does not exist, this is usually a sign that you need to update Stan.

We highly recommend you use Cmdstan through the cmdstanr interface. This is because Cmdstan is easier to install, is more up to date with new features, and uses less memory than rstan. See this documentation from the Cmdstan team for more information.

Citing mvgam and related software

When using any software please make sure to appropriately acknowledge the hard work that developers and maintainers put into making these packages available. Citations are currently the best way to formally acknowledge this work, so we highly encourage you to cite any packages that you rely on for your research.

When using mvgam, please cite the following:

  • Clark, N.J. and Wells, K. (2022). Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution. DOI: https://doi.org/10.1111/2041-210X.13974

As mvgam acts as an interface to Stan and JAGS, please additionally cite whichever software you use for parameter estimation:

  • Carpenter B., Gelman A., Hoffman M. D., Lee D., Goodrich B., Betancourt M., Brubaker M., Guo J., Li P., and Riddell A. (2017). Stan: A probabilistic programming language. Journal of Statistical Software. 76(1). 10.18637/jss.v076.i01
  • Plummer, M. (2013). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. Proceedings of the 3rd International Workshop on Distributed Statistical Computing. 124(125.10).

mvgam relies on several other R packages and, of course, on R itself. To find out how to cite R and its packages, use the citation function. There are some features of mvgam which specifically rely on certain packages. The most important of these is the generation of data necessary to estimate smoothing splines, which entirely rely on mgcv. The rstan and cmdstanr packages together with Rcpp makes Stan conveniently accessible in R, while the rjags and runjags packages together with the coda package make JAGS accessible in R. If you use some of these features, please also consider citing the related packages.

Cheatsheet

Introducing mvgam for fitting Generalized Additive Models to time series

We can explore the model’s primary functions using a dataset that is available with all R installations. Load the lynx data and plot the series as well as its autocorrelation function

data(lynx)
lynx_full <- data.frame(year = 1821:1934, 
                        population = as.numeric(lynx))
plot(lynx_full$population, type = 'l', ylab = 'Lynx trappings',
     xlab = 'Time', bty = 'l', lwd = 2)
box(bty = 'l', lwd  = 2)
acf(lynx_full$population, main = '', bty = 'l', lwd = 2,
    ci.col = 'darkred')
box(bty = 'l', lwd  = 2)

Along with serial autocorrelation, there is a clear ~19-year cyclic pattern. Create a season term that can be used to model this effect and give a better representation of the data generating process than we would likely get with a linear model

plot(stl(ts(lynx_full$population, frequency = 19), s.window = 'periodic'),
     lwd = 2, col.range = 'darkred')
lynx_full$season <- (lynx_full$year%%19) + 1

For mvgam models, we need an indicator of the series name as a factor (if the column series is missing, this will be added automatically by assuming that all observations are from a single time series). A time column is needed to index time

lynx_full$time <- 1:NROW(lynx_full)
lynx_full$series <- factor('series1')

Split the data into training (first 50 years) and testing (next 10 years of data) to evaluate forecasts

lynx_train = lynx_full[1:50, ]
lynx_test = lynx_full[51:60, ]

Inspect the series in a bit more detail using mvgam’s plotting utility

plot_mvgam_series(data = lynx_train, y = 'population')

Formulate an mvgam model; this model fits a GAM in which a cyclic smooth function for season is estimated jointly with a full time series model for the temporal process (in this case an AR1 process). We assume the outcome follows a Poisson distribution and will condition the model in Stan using MCMC sampling with the Cmdstan interface:

lynx_mvgam <- mvgam(population ~ s(season, bs = 'cc', k = 12),
                    knots = list(season = c(0.5, 19.5)),
                    data = lynx_train,
                    newdata = lynx_test,
                    family = poisson(),
                    trend_model = AR(p = 1),
                    backend = 'cmdstanr')

Have a look at this model’s summary to see what is being estimated. Note that no pathological behaviours have been detected and we achieve good effective sample sizes / mixing for all parameters

summary(lynx_mvgam)
#> GAM formula:
#> population ~ s(season, bs = "cc", k = 12)
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> AR(p = 1)
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 60 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> 
#> GAM coefficient (beta) estimates:
#>                2.5%   50%  97.5% Rhat n_eff
#> (Intercept)   6.400  6.60  6.900 1.00   961
#> s(season).1  -0.650 -0.14  0.380 1.00   984
#> s(season).2   0.730  1.30  1.900 1.00  1113
#> s(season).3   1.300  1.90  2.500 1.00   727
#> s(season).4  -0.054  0.53  1.100 1.00   727
#> s(season).5  -1.300 -0.70 -0.069 1.00   919
#> s(season).6  -1.200 -0.55  0.170 1.00  1012
#> s(season).7   0.012  0.73  1.400 1.00  1134
#> s(season).8   0.600  1.40  2.100 1.01   756
#> s(season).9  -0.410  0.22  0.810 1.00   710
#> s(season).10 -1.400 -0.87 -0.350 1.00  1164
#> 
#> Approximate significance of GAM smooths:
#>            edf Ref.df Chi.sq p-value    
#> s(season) 9.95     10   53.4  <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Latent trend parameter AR estimates:
#>          2.5%  50% 97.5% Rhat n_eff
#> ar1[1]   0.59 0.83  0.99 1.00   519
#> sigma[1] 0.38 0.48  0.60 1.01   485
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 2000 iterations ended with a divergence (0%)
#> 0 of 2000 iterations saturated the maximum tree depth of 12 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Tue Sep 03 1:56:36 PM 2024.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)

As with any MCMC software, we can inspect traceplots. Here for the GAM smoothing parameters, using mvgam’s reliance on the excellent bayesplot library:

mcmc_plot(lynx_mvgam, variable = 'rho', regex = TRUE, type = 'trace')
#> No divergences to plot.

and for the latent trend parameters

mcmc_plot(lynx_mvgam, variable = 'trend_params', regex = TRUE, type = 'trace')
#> No divergences to plot.

Use posterior predictive checks, which capitalize on the extensive functionality of the bayesplot package, to see if the model can simulate data that looks realistic and unbiased. First, examine histograms for posterior retrodictions (yhat) and compare to the histogram of the observations (y)

pp_check(lynx_mvgam, type = "hist", ndraws = 5)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Next examine simulated empirical Cumulative Distribution Functions (CDF) for posterior predictions

pp_check(lynx_mvgam, type = "ecdf_overlay", ndraws = 25)

Rootograms are popular graphical tools for checking a discrete model’s ability to capture dispersion properties of the response variable. Posterior predictive hanging rootograms can be displayed using the ppc() function. In the plot below, we bin the unique observed values into 25 bins to prevent overplotting and help with interpretation. This plot compares the frequencies of observed vs predicted values for each bin. For example, if the gray bars (representing observed frequencies) tend to stretch below zero, this suggests the model’s simulations predict the values in that particular bin less frequently than they are observed in the data. A well-fitting model that can generate realistic simulated data will provide a rootogram in which the lower boundaries of the grey bars are generally near zero. For this plot we use the S3 function ppc.mvgam(), which is not as versatile as pp_check() but allows us to bin rootograms to avoid overplotting

ppc(lynx_mvgam, type = "rootogram", n_bins = 25)

All plots indicate the model is well calibrated against the training data. Inspect the estimated cyclic smooth, which is shown as a ribbon plot of posterior empirical quantiles. We can also overlay posterior quantiles of partial residuals (shown in red), which represent the leftover variation that the model expects would remain if this smooth term was dropped but all other parameters remained unchanged. A strong pattern in the partial residuals suggests there would be strong patterns left unexplained in the model if we were to drop this term, giving us further confidence that this function is important in the model

plot(lynx_mvgam, type = 'smooths', residuals = TRUE)

First derivatives of smooths can be plotted to inspect how the slope of the function changes. To plot these we use the more flexible plot_mvgam_smooth() function

plot_mvgam_smooth(lynx_mvgam, series = 1, 
                  smooth = 'season', 
                  derivatives = TRUE)

If you have the gratia package installed, it can also be used to plot partial effects of smooths on the link scale

require(gratia)
#> Loading required package: gratia
#> Warning: package 'gratia' was built under R version 4.2.3
#> 
#> Attaching package: 'gratia'
#> The following object is masked from 'package:mvgam':
#> 
#>     add_residuals
draw(lynx_mvgam)

As for many types of regression models, it is often more useful to plot model effects on the outcome scale. mvgam has support for the wonderful marginaleffects package, allowing a wide variety of posterior contrasts, averages, conditional and marginal predictions to be calculated and plotted. Below is the conditional effect of season plotted on the outcome scale, for example:

require(ggplot2); require(marginaleffects)
#> Loading required package: marginaleffects
plot_predictions(lynx_mvgam, condition = 'season', points = 0.5) +
  theme_classic()

We can also view the mvgam’s posterior predictions for the entire series (testing and training)

plot(lynx_mvgam, type = 'forecast', newdata = lynx_test)
#> Out of sample DRPS:
#> 2356.32008125

And the estimated latent trend component, again using the more flexible plot_mvgam_...() option to show first derivatives of the estimated trend

plot_mvgam_trend(lynx_mvgam, newdata = lynx_test, derivatives = TRUE)

A key aspect of ecological forecasting is to understand how different components of a model contribute to forecast uncertainty. We can estimate relative contributions to forecast uncertainty for the GAM component and the latent trend component using mvgam

plot_mvgam_uncertainty(lynx_mvgam, newdata = lynx_test, legend_position = 'none')
text(1, 0.2, cex = 1.5, label="GAM component", 
     pos = 4, col="white", family = 'serif')
text(1, 0.8, cex = 1.5, label="Trend component", 
     pos = 4, col="#7C0000", family = 'serif')

Both components contribute to forecast uncertainty. Diagnostics of the model can also be performed using mvgam. Have a look at the model’s residuals, which are posterior empirical quantiles of Dunn-Smyth randomised quantile residuals so should follow approximate normality. We are primarily looking for a lack of autocorrelation, which would suggest our AR1 model is appropriate for the latent trend

plot(lynx_mvgam, type = 'residuals')

Extended observation families

mvgam was originally designed to analyse and forecast non-negative integer-valued data. These data are traditionally challenging to analyse with existing time-series analysis packages. But further development of mvgam has resulted in support for a growing number of observation families. Currently, the package can handle data for the following:

  • gaussian() for real-valued data
  • student_t() for heavy-tailed real-valued data
  • lognormal() for non-negative real-valued data
  • Gamma() for non-negative real-valued data
  • betar() for proportional data on (0,1)
  • bernoulli() for binary data
  • poisson() for count data
  • nb() for overdispersed count data
  • binomial() for count data with known number of trials
  • beta_binomial() for overdispersed count data with known number of trials
  • nmix() for count data with imperfect detection (unknown number of trials)
  • tweedie() for overdispersed count data

Note that only poisson(), nb(), and tweedie() are available if using JAGS. All families, apart from tweedie(), are supported if using Stan. See ??mvgam_families for more information. Below is a simple example for simulating and modelling proportional data with Beta observations over a set of seasonal series with independent Gaussian Process dynamic trends:

set.seed(100)
data <- sim_mvgam(family = betar(),
                  T = 80,
                  trend_model = 'GP',
                  prop_trend = 0.5, 
                  seasonality = 'shared')
plot_mvgam_series(data = data$data_train, series = 'all')
mod <- mvgam(y ~ s(season, bs = 'cc', k = 7) +
               s(season, by = series, m = 1, k = 5),
             trend_model = 'GP',
             data = data$data_train,
             newdata = data$data_test,
             family = betar())

Inspect the summary to see that the posterior now also contains estimates for the Beta precision parameters $\phi$. We can suppress a summary of the $\beta$ coefficients, which is useful when there are many spline coefficients to report:

summary(mod, include_betas = FALSE)
#> GAM formula:
#> y ~ s(season, bs = "cc", k = 7) + s(season, by = series, m = 1, 
#>     k = 5)
#> 
#> Family:
#> beta
#> 
#> Link function:
#> logit
#> 
#> Trend model:
#> GP
#> 
#> N series:
#> 3 
#> 
#> N timepoints:
#> 80 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> 
#> Observation precision parameter estimates:
#>        2.5% 50% 97.5% Rhat n_eff
#> phi[1]  5.4 8.3    12    1  2019
#> phi[2]  5.7 8.7    13    1  1795
#> phi[3]  5.5 8.4    12    1  1680
#> 
#> GAM coefficient (beta) estimates:
#>              2.5%  50% 97.5% Rhat n_eff
#> (Intercept) -0.18 0.19  0.45 1.01   757
#> 
#> Approximate significance of GAM smooths:
#>                            edf Ref.df Chi.sq p-value    
#> s(season)                4.231      5  28.89 2.6e-06 ***
#> s(season):seriesseries_1 0.687      4   0.69    0.98    
#> s(season):seriesseries_2 0.664      4   0.60    0.99    
#> s(season):seriesseries_3 1.286      4   1.39    0.82    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Latent trend marginal deviation (alpha) and length scale (rho) estimates:
#>             2.5%  50% 97.5% Rhat n_eff
#> alpha_gp[1] 0.06 0.41  0.96 1.00   823
#> alpha_gp[2] 0.38 0.73  1.30 1.00  1073
#> alpha_gp[3] 0.15 0.45  0.98 1.00   903
#> rho_gp[1]   1.10 3.90 14.00 1.01   302
#> rho_gp[2]   1.70 7.00 32.00 1.01   364
#> rho_gp[3]   1.30 4.80 22.00 1.01   373
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 18 of 2000 iterations ended with a divergence (0.9%)
#>  *Try running with larger adapt_delta to remove the divergences
#> 0 of 2000 iterations saturated the maximum tree depth of 12 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Tue Sep 03 1:57:57 PM 2024.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)

Plot the hindcast and forecast distributions for each series

layout(matrix(1:4, nrow = 2, byrow = TRUE))
for(i in 1:3){
  plot(mod, type = 'forecast', series = i)
}

There are many more extended uses of mvgam, including the ability to fit hierarchical GAMs that include dynamic coefficient models, dynamic factor and Vector Autoregressive processes. See the package documentation for more details. The package can also be used to generate all necessary data structures, initial value functions and modelling code necessary to fit DGAMs using Stan or JAGS. This can be helpful if users wish to make changes to the model to better suit their own bespoke research / analysis goals. The following resources can be helpful to troubleshoot:

  • Stan Discourse
  • JAGS Discourse

License

This project is licensed under an MIT open source license

Interested in contributing?

I’m actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of mvgam. Please reach out if you are interested (n.clark’at’uq.edu.au)

Copy Link

Version

Install

install.packages('mvgam')

Monthly Downloads

565

Version

1.1.3

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Nicholas J Clark

Last Published

September 4th, 2024

Functions in mvgam (1.1.3)

logLik.mvgam

Compute pointwise Log-Likelihoods from fitted mvgam objects
loo.mvgam

LOO information criteria for mvgam models
hindcast.mvgam

Extract hindcasts for a fitted mvgam object
index-mvgam

Index mvgam objects
formula.mvgam

Extract formulae from mvgam objects
forecast.mvgam

Extract or compute hindcasts and forecasts for a fitted mvgam object
irf.mvgam

Calculate latent VAR impulse response functions
lfo_cv.mvgam

Approximate leave-future-out cross-validation of fitted mvgam objects
gratia_mvgam_enhancements

Enhance mvgam post-processing using gratia functionality
get_mvgam_priors

Extract information on default prior distributions for an mvgam model
mvgam_draws

Extract posterior draws from fitted mvgam objects
monotonic

Monotonic splines in mvgam
mvgam_diagnostics

Extract diagnostic quantities of mvgam models
mvgam-class

Fitted mvgam object description
lv_correlations

Calculate trend correlations based on mvgam latent factor loadings
model.frame.mvgam

Extract model.frame from a fitted mvgam object
mcmc_plot.mvgam

MCMC plots as implemented in bayesplot
mvgam_families

Supported mvgam families
mvgam_forecast-class

mvgam_forecast object description
mvgam

Fit a Bayesian dynamic GAM to a univariate or multivariate set of time series
plot.mvgam_lfo

Plot Pareto-k and ELPD values from a leave-future-out object
mvgam_trends

Supported mvgam trend models
mvgam_formulae

Details of formula specifications in mvgam
%>%

Pipe operator
plot.mvgam_irf

Plot impulse responses from an mvgam_irf object This function takes an mvgam_irf object and produces plots of Impulse Response Functions
pairs.mvgam

Create a matrix of output plots from a mvgam object
mvgam_irf-class

mvgam_irf object description
PW

Specify piecewise linear or logistic trends
plot.mvgam

Default mvgam plots
mvgam_marginaleffects

Helper functions for mvgam marginaleffects calculations
plot_mvgam_series

Plot observed time series used for mvgam modelling
plot_mvgam_randomeffects

Plot mvgam random effect terms
plot_mvgam_smooth

Plot mvgam smooth terms
plot_mvgam_pterms

Plot mvgam parametric term partial effects
plot_mvgam_uncertainty

Plot mvgam forecast uncertainty contributions for a specified series
plot_mvgam_resids

Residual diagnostics for a fitted mvgam object
plot_mvgam_factors

Latent factor summaries for a fitted mvgam object
portal_data

Portal Project rodent capture survey data
plot_mvgam_forecasts

Plot mvgam posterior predictions for a specified series
plot_mvgam_trend

Plot mvgam latent trend for a specified series
posterior_predict.mvgam

Draws from the Posterior Predictive Distribution
residuals.mvgam

Posterior draws of mvgam residuals
ppc.mvgam

Plot mvgam posterior predictive checks for a specified series
predict.mvgam

Predict from the GAM component of an mvgam model
print.mvgam

Summary for a fitted mvgam object
reexports

Objects exported from other packages
series_to_mvgam

This function converts univariate or multivariate time series (xts or ts objects) to the format necessary for mvgam
score.mvgam_forecast

Compute probabilistic forecast scores for mvgam objects
posterior_linpred.mvgam

Posterior Draws of the Linear Predictor
sim_mvgam

Simulate a set of time series for mvgam modelling
posterior_epred.mvgam

Draws from the Expected Value of the Posterior Predictive Distribution
pp_check.mvgam

Posterior Predictive Checks for mvgam Objects
summary.mvgam

Summary for a fitted mvgam object
update.mvgam

Update an existing mvgam object
evaluate_mvgams

Evaluate forecasts from fitted mvgam objects
fitted.mvgam

Expected Values of the Posterior Predictive Distribution
RW

Specify autoregressive dynamic processes
dynamic

Defining dynamic coefficients in mvgam formulae
conditional_effects.mvgam

Display Conditional Effects of Predictors
code

Stan code and data objects for mvgam models
all_neon_tick_data

NEON Amblyomma and Ixodes tick abundance survey data
ensemble.mvgam_forecast

Combine mvgam forecasts into evenly weighted ensembles
GP

Specify dynamic Gaussian processes
add_residuals.mvgam

Calculate randomized quantile residuals for mvgam objects