autorun.jags: Run a User Specified Bayesian MCMC Model in JAGS with Automatically Calculated Run Length and Convergence Diagnostics

Description

Runs a user specified JAGS (similar to WinBUGS) model from within R, returning a list of the MCMC chain(s) along with convergence diagnostics, autocorrelation diagnostics and monitored variable summaries. Chain convergence over the first run of the simulation is assessed using the Gelman and Rubin's convergence diagnostic. If necessary, the simulation is extended to improve chain convergence (up to a user-specified maximum time limit), before the required sample size of the Markov chain is calculated using the Raftery and Lewis's diagnostic. The simulation is extended to the required sample size dependant on autocorrelation and the number of chains.

This function is provided as an educational tool only, and is not a replacement for manually assessing convergence and Monte Carlo error for real-world applications. For more complex models, the use of run.jags directly with manual assessment of necessary run length is recommended. JAGS is called using the lower level function run.jags.

Usage

autorun.jags(model=stop("No model supplied"), 
   monitor = stop("No monitored variables supplied"), 
   data=NA, n.chains=2, inits = replicate(n.chains, NA), 
   startburnin = 5000, startsample = 10000,
   psrf.target = 1.05, normalise.mcmc = TRUE,
   check.stochastic = TRUE, raftery.options = list(),
   crash.retry = 1, plots = TRUE, thin.sample = TRUE,
   jags = findjags(), silent.jags = FALSE, 
   interactive=TRUE, max.time=Inf, 
   adaptive=list(type="burnin", length=200), modules=c(""), 
   factories=c(""), thin = 1, monitor.deviance = FALSE, 
   monitor.pd = FALSE, monitor.pd.i = FALSE,
   monitor.popt = FALSE, keep.jags.files = FALSE, tempdir=TRUE, 
   method=if(.Platform$OS.type=='unix' & .Platform$GUI!="AQUA" &
   Sys.info()['user']!='nobody') 'interruptible' else 'simple', 
   batch.jags=silent.jags)

Arguments

model

a character string of the model in the JAGS language. No default.

monitor

a character vector of the names of variables to monitor. For all models, specifying 'deviance' as a monitored variable will calculate the model deviance, and 'dic' will calculate the Deviance Information Criterion (implies monitor.deviance etc, and requi

data

either a named list or a character string in the R dump format containing the data. If left as NA, the model will be run without external data.

n.chains

the number of chains to use with the simulation. More chains will improve the sensitivity of the convergence diagnostic, but will cause the simulation to run more slowly. The minimum (and default) number of chains is 2.

inits

either a character vector with length equal to the number of chains the model will be run using, or a list of named lists representing names and corresponding values of inits for each chain. If a vector, each element of the vector must be a character str

startburnin

the number of initial updates to discard before sampling. Only used on the initial run before checking convergence. Default 5000 iterations.

startsample

the number of samples on which to assess convergence. More samples will give a better chance of allowing the chain to converge, but will take longer to achieve. Also controls the length of the pilot chain used to assess the required sampling length. Th

psrf.target

the value of the point estimate for the potential scale reduction factor of the Gelman Rubin statistic below which the chains are deemed to have converged (must be greater than 1). Default 1.05.

normalise.mcmc

the Gelman Rubin statistic is based on the assumption that the posterior distribution of monitored variables is roughly normal. For very skewed posterior distributions, it may help to log/logit transform the posterior before calculating the Gelman Rubin

check.stochastic

non-stochastic monitored variables will cause errors when calculating the Gelman-Rubin statistic, if check.stochastic==TRUE then all monitored variables will be checked to ensure they are stochastic beforehand. This has a computational cost, and can be b

raftery.options

a named list which is passed as additional arguments to raftery.diag. Default none (default arguments to raftery.diag are used).

crash.retry

the number of times to re-attempt a simulation if the model returns an error. Default 1 retry (simulation will be aborted after the second crash).

plots

should traceplots and density plots be produced for each monitored variable? If TRUE, the returned list will include elements 'trace' and 'density' which consist of a list of lattice objects. The alternative is to use plot(results$mcmc) to look at the d

thin.sample

option to thin the final MCMC chain(s) before calculating summary statistics and returning the chains. Thinning very long chains allows summary statistics to be calculated more quickly. If TRUE, the chain is thinned to as close to a minimum of startsamp

jags

the system call or path for activating JAGS. Default calls findjags() to attempt to locate JAGS on your system.

silent.jags

should the JAGS output be suppressed? (logical) If TRUE, no indication of the progress of individual models is supplied. Default FALSE.

interactive

option to allow the simulation to be interactive, in which case the user is asked if the simulation should be extended when run length and convergence calculations are performed and the extended simulation will take more than 1 minute. The function will

max.time

the maximum time for which the function is allowed to extend the chains to improve convergence, as a character string including units or as an integer in which case units are taken as seconds. Ignored if interactive==TRUE. If the function thinks that th

adaptive

a list of advanced options controlling the length of the adaptive mode of each simulation. Extended simulations do not require an adaptive phase, but JAGS prints a warning if one is not performed. Reduce the length of the adpative phase for very time co

modules

external modules to be loaded into JAGS. More than 1 module can be used. Default none.

factories

factory modules to be loaded into JAGS. More than 1 factory can be used. Entried should be in the format '"" , type()', for example: factories='"mix::TemperedMix" off, type(sampler)'. Default none.

thin

the thinning interval to be used in JAGS. Increasing the thinning interval may reduce autocorrelation, and therefore reduce the number of samples required, but will increase the time required to run the simulation. Using this option thinning is performe

monitor.deviance

option to monitor the total deviance of the model using the DIC module in JAGS. If TRUE, an additional monitor called 'deviance' is added to the MCMC objects returned, representing the deviance of the model for each iteration and each chain. This option

monitor.pd

option to monitor the total effective number of parameters in the model using the DIC module for JAGS. If TRUE, a 'pd' element is returned representing the total effective number of parameters at each iteration. This option requires JAGS version 2 or gr

monitor.pd.i

option to monitor the contribution of each parameter towards the total effective number of parameters using the DIC module for JAGS. If TRUE, a 'pd.i' element is returned representing the mean value for each parameter. This option requires JAGS version

monitor.popt

option to monitor the optimism of the expected deviance using the DIC module for JAGS. If TRUE, a 'popt' element is returned representing the mean value for each parameter. This option requires JAGS version 2 or greater and at least 2 chains. For more

keep.jags.files

option to keep the folder with files needed to call JAGS, rather than deleting it. May be useful for attempting to bug fix models. Since autorun.jags typically makes several calls to JAGS, all folders are kept - the order in which the folders were used

tempdir

option to use the temporary directory as specified by the system rather than creating files in the working directory. If keep.jags.files==TRUE then the folder is copied to the working directory after the job has finished (with a unique folder name based

method

the method with which to call JAGS; one of 'simple', 'interruptible' or 'parallel'. The former runs JAGS as a foreground process (the default behaviour for runjags < 0.9.6), 'interruptible' allows the JAGS process to be terminated immediately using the i

batch.jags

option to call JAGS in batch mode, rather than using input redirection. On JAGS >= 3.0.0, this suppresses output of the status which may be useful in some situations. Default TRUE if silent.jags is TRUE, or FALSE otherwise.

Value

A list including the following elements:
mcmcan MCMC list of MCMC objects representing the chains. Each MCMC object represents the value of each monitored variable (including the deviance, if specified) for that chain at each iteration. Renamed pilot.mcmc if the simulation is aborted before convergence or before the required sampling length is achieved
end.statethe end state of the last simulation extension performed. Can be used as initial values to extend the simulation further if required
req.samplesthe minimum sample size required for the Markov chain as calculated using the Raftery and Lewis's diagnostic. This sample size is dependent on the thinning interval in JAGS
samples.to.convthe number of sampled iterations discarded due to poor convergence. The total number of iterations performed for the simulation is equal to req.samples + req.burnin + samples.to.conv (unless the simulation was aborted before reaching the required sampling length)
thinthe thinning interval in JAGS used for the chains.
summarythe summary statistics for the monitored variables from the combined chains. Renamed pilot.summary if pilot.only==TRUE or if the simulation is aborted before the required sampling length is achieved
HPDthe 95% highest posterior density and median value of each monitored variable from the combined chains.
psrfthe Gelman Rubin statistic for the monitored variables (similar to output of gelman.diag())
autocorrthe autocorrelation diagnostic for the monitored variables (output of autocorr.diag())
tracea list of lattice objects representing traceplots for each monitored variable, of class 'plotindpages'. Calling each individual element will result in the traceplot for that variable being shown, calling the entire list will result in all traceplots being shown in new windows (which may cause problems with your R session if there are several monitored variables). To override the individual window plotting behaviour (to combine plots and/or save the plots to a file), either change the class of each object using for(i in 1:nvar(results$mcmc)) class(results$trace[[i]]) <- 'trellis' and then combine using the c.trellis method in the latticeExtra package, or simply re-generate the plots using the raw mcmc output. Not produced if plots==FALSE.
densitya list of lattice objects representing density plots for each monitored variable, of class 'plotindpages'. Calling each individual element will result in the density plot for that variable being shown, calling the entire list will result in all density plots being shown in new windows (which may cause problems with your R session if there are several monitored variables). To override the individual window plotting behaviour (to combine plots and/or save the plots to a file), either change the class of each object using for(i in 1:nvar(results$mcmc)) class(results$density[[i]]) <- 'trellis' and then combine using the c.trellis method in the latticeExtra package, or simply re-generate the plots using the raw mcmc output. Not produced if plots==FALSE.
pdthe total effective number of parameters in the model calculated using the DIC module for JAGS. Returned only if monitor.pd=TRUE is supplied to the function (or if 'dic' is specified as a monitored variable).
pd.ithe contribution of each parameter towards the total effective number of parameters calculated using the DIC module for JAGS. Returned only if monitor.pd.i=TRUE is supplied to the function.
poptthe optimism of the expected deviance calculated using the DIC module for JAGS. Returned only if monitor.popt=TRUE is supplied to the function (or if 'dic' is specified as a monitored variable).
dicThe Deviance Information Criterion (DIC) model fit statistics. Returned only if 'dic' is specified as a monitored variable. The default print method displays the DIC calculated using both pd (dic) and popt (ped), using the deviance from the combined chains and for each individual chain. Separate values for the mean pd, sum popt and mean deviance is also listed. For more information on the types of DIC calculated, see the JAGS manual.

Details

This function runs the specified model until the point estimate of the potential scale reduction factor of the Gelman-Rubin statistic for each monitored parameter is less than psrf.target (default 1.05). This is intended to make sure that the chains have converged before sampling from them (although this does not guarantee convergence so manual checking of traceplots is recommended). If convergence is not achieved within the time limit, then the function returns pilot.mcmc rather than mcmc, which will always be of length startsample since it is the unconverged mcmc object returned from the last run of the model. This chain is not suitable for making inference from, so a different name is used to avoid confusion. If convergence is achieved, the Raftery and Lewis's diagnostic is used to calculate the required number of samples based on the most heavily autocorrelated monitored variable. The chains are then extended to increase the sample size of each variable as necessary (so that sufficient samples are obtained from the COMBINED chains to satisfy the Raftery and Lewis's diagnostic) before being summarised and returned. Heavily autocorrelated models with large numbers of monitored variables may result in a required sample size larger than the available memory in R. If this is the case, try using the thin option to reduce autocorrelation (and therefore the required sample size) or monitor less variables.

Examples

Run this code

# run a model to calculate the intercept and slope of the expression 
# y = m x + c, assuming normal observation errors for y:

# Simulate the data
x <- 1:100
y <- rnorm(length(x), 2*x + 10, 1)

# Model in the JAGS format
model <- "model {
for(i in 1 : N){
Y[i] ~ dnorm(true.y[i], precision);
true.y[i] <- (m * X[i]) + c;
}
m ~ dunif(-1000,1000);
c ~ dunif(-1000,1000);
precision ~ dexp(1);
}"

# Convert the data to a named list
data <- list(X=x, Y=y, N=length(x))

# Run the model
results <- autorun.jags(model=model, monitor=c("m", "c", "precision"), 
	data=data)

# Analyse traceplots of the results to assess convergence:
results$trace

# Summary of monitored variables:
results$summary

Run the code above in your browser using DataLab