autorun.jags: Run or Extend a User Specified Bayesian MCMC Model in JAGS with Automatically Calculated Run Length and Convergence Diagnostics

Description

Runs or extends a user specified JAGS (similar to WinBUGS) model from within R, returning an object of class runjags-class. Chain convergence over the first run of the simulation is assessed using the Gelman and Rubin's convergence diagnostic. If necessary, the simulation is extended to improve chain convergence (up to a user-specified maximum time limit), before the required sample size of the Markov chain is calculated using Raftery and Lewis's diagnostic. The simulation is extended to the required sample size dependant on autocorrelation and the number of chains.

This function is provided primarily for automated running of large simulated data studies, and is not a replacement for manually assessing convergence and Monte Carlo error when parameter estimates are being made from real data. For more complex models, the use of run.jags directly with manual assessment of necessary run length may be preferable.

Requires Just Another Gibbs Sampler (JAGS), see http://mcmc-jags.sourceforge.net.

Usage

autorun.jags(model, monitor = NA, data=NA, n.chains=NA, 
	inits = NA, startburnin = 4000, startsample = 10000,
	datalist=NA, initlist=NA, psrf.target = 1.05, normalise.mcmc = TRUE,
	check.stochastic = TRUE, modules=runjags.getOption('modules'), 
	factories=runjags.getOption('factories'), 
	raftery.options = list(), crash.retry=1, summarise = TRUE,
	confidence=0.95, plots = runjags.getOption('predraw.plots') && summarise, 
	thin.sample = FALSE, jags = runjags.getOption('jagspath'), 
	silent.jags = runjags.getOption("silent.jags"), interactive=FALSE, 
	max.time=Inf, adaptive=1000, thin = 1, monitor.deviance = FALSE, 
	monitor.pd = FALSE, tempdir=runjags.getOption('tempdir'), 
	jags.refresh=0.1, batch.jags=silent.jags, 
	method=runjags.getOption('method'), method.options=list())
autoextend.jags(runjags.object, add.monitor=character(0), 
	drop.monitor=character(0), drop.chain=numeric(0),
	combine=length(c(add.monitor,drop.monitor,drop.chain))==0,
	startburnin = 0, startsample = 10000, psrf.target = 1.05,
	normalise.mcmc = TRUE, check.stochastic = TRUE, 
	raftery.options = list(), crash.retry=1, summarise = TRUE, 
	confidence=0.95, plots = runjags.getOption('predraw.plots') && summarise, 
	thin.sample = FALSE, jags = runjags.getOption('jagspath'), 
	silent.jags = runjags.getOption('silent.jags'),	interactive=FALSE, 
	max.time=Inf, adaptive=1000, thin = runjags.object$thin, 
	tempdir=runjags.getOption('tempdir'), jags.refresh=0.1, 
	batch.jags=silent.jags, method=NA, method.options=NA)

Arguments

model

either a relative or absolute path to a textfile (including the file extension) containing a model in the JAGS language and possibly monitored variable names, data and/or initial values, or a character string of the same. No default. The model must be s

monitor

a character vector of the names of variables to monitor. The special node names 'deviance', 'pd', 'pd.i', 'popt' and 'dic' are used to monitor these model fit diagnostics (see the JAGS user manual for more information), but with the exception of 'devianc

data

either a named list or a character string in the R dump format containing the data. If left as NA, the model will be run without external data.

n.chains

the number of chains to use with the simulation. More chains will improve the sensitivity of the convergence diagnostic, but will cause the simulation to run more slowly (although this may be improved by using a method such as 'parallel' or 'snow'). The

inits

either a character vector with length equal to the number of chains the model will be run using, or a list of named lists representing names and corresponding values of inits for each chain, or a function with either 1 argument representing the chain or

runjags.object

the model to be extended - the output of a run.jags (or autorun.jags or extend.jags etc) function, with class 'runjags'. No default.

add.monitor

a character vector of variables to add to the monitored variable list. All previously monitored variables are automatically included - although see the 'drop.monitor' argument. Default no additional monitors.

drop.monitor

a character vector of previously monitored variables to remove from the monitored variable list for the extended model. Default none.

drop.chain

a numeric vector of chains to remove from the extended model. Default none.

combine

a logical flag indicating if results from the new JAGS run should be combined with the previous chains. Default TRUE if not adding or removing variables or chains, and FALSE otherwise.

startburnin

the number of burnin iterations, NOT including the adaptive iterations to use for the initial pilot run of the chains.

startsample

the total number of samples (including the chains supplied in runjags.object for autoextend.jags) on which to assess convergence. If the runjags.object already contains this number of samples then convergence will be assessed on this object, otherwise th

datalist

an optional named list containing variables used as data, or alternatively a function (with no arguments) that returns a named list. If any variables are specified in the model block using '#data# ', the value for the corresponding named variab

initlist

an optional named list containing variables used as initial values, or alternatively a function (with a single argument representing the chain number) that returns a named list. If any variables are specified in the model block using '#inits# '

psrf.target

the value of the point estimate for the potential scale reduction factor of the Gelman Rubin statistic below which the chains are deemed to have converged (must be greater than 1). Default 1.05.

normalise.mcmc

the Gelman Rubin statistic is based on the assumption that the posterior distribution of monitored variables is roughly normal. For very skewed posterior distributions, it may help to log/logit transform the posterior before calculating the Gelman Rubin

check.stochastic

non-stochastic monitored variables will cause errors when calculating the Gelman-Rubin statistic, if check.stochastic==TRUE then all monitored variables will be checked to ensure they are stochastic beforehand. This has a small computational cost, which

modules

a character vector of external modules to be loaded into JAGS. More than 1 module can be used. Default none.

factories

a character vector of factory modules to be loaded into JAGS. More than 1 factory can be used. Factories should be provided in the format '()', for example: factories='mix::TemperedMix(sampler)'. Also ensure that any required modules

raftery.options

a named list which is passed as additional arguments to raftery.diag. Default none (default arguments to raftery.diag are used).

crash.retry

the number of times to re-attempt a simulation if the model returns an error. Default 1 retry (simulation will be aborted after the second crash).

summarise

should summary statistics be automatically calculated for the output chains? Default TRUE.

confidence

the prob argument to be passed to HPDinterval for calculation of confidence intervals. Default 0.95 (95% confidence intervals).

plots

should traceplots and density plots be pre-drawn by runjags to facilitate more convinient assessment of convergence after the model has finished running? If TRUE, the returned list will include elements 'trace' and 'density' which consist of a list of la

thin.sample

option to thin the final MCMC chain(s) before calculating summary statistics and returning the chains. Thinning very long chains allows summary statistics to be calculated more quickly. If TRUE, the chain is thinned to as close to a minimum of startsamp

jags

the system call or path for activating JAGS. Default calls findjags() to attempt to locate JAGS on your system.

silent.jags

should the JAGS output be suppressed? (logical) If TRUE, no indication of the progress of individual models is supplied. Note that output will still be produced by runjags even if silent.jags is set to TRUE - to suppress all output set silent.jags and si

interactive

option to allow the simulation to be interactive, in which case the user is asked if the simulation should be extended when run length and convergence calculations are performed and the extended simulation will take more than 1 minute. The function will

max.time

the maximum time for which the function is allowed to extend the chains to improve convergence, as a character string including units or as an integer in which case units are taken as seconds. Ignored if interactive==TRUE. If the function thinks that th

adaptive

the length of the adaptive phase to use when re-compiling models. This will be run for every new simulation except for the rjags method where it is only required to be run during the model compilation phase (this will only be performed once). Note that

thin

the thinning interval to be used in JAGS. Increasing the thinning interval may reduce autocorrelation, and therefore reduce the number of samples required, but will increase the time required to run the simulation. Using this option thinning is performe

monitor.deviance

this argument is deprecated and remains for backwards compatibility only. See the 'monitor' variable.

monitor.pd

this argument is deprecated and remains for backwards compatibility only. See the 'monitor' variable.

tempdir

option to use the temporary directory as specified by the system rather than creating files in the working directory. Any files created in the temporary directory are removed when the function exits for any reason. Default TRUE.

jags.refresh

the refresh interval (in seconds) for monitoring JAGS output using the 'interactive' and 'parallel' methods (see the 'method' argument). Longer refresh intervals will use less processor time. Default 0.1 seconds.

batch.jags

option to call JAGS in batch mode, rather than using input redirection. On JAGS >= 3.0.0, this suppresses output of the status which may be useful in some situations. Default TRUE if silent.jags is TRUE, or FALSE otherwise.

method

the method with which to call JAGS; probably a character vector specifying one of 'rjags', 'simple', 'interruptible', 'parallel', 'rjparallel' or 'snow' (and see also xgrid.autoextend.jags).

method.options

an optional named list of argument to be passed to the method function (including a user specified method function). Of the default arguments, only 'nsims' indicating the number of separate simulations (for parallel, snow and bgparallel methods) and 'cl'

Value

an object of class 'runjags' (see runjags-class).

Examples

Run this code

# run a model to calculate the intercept and slope of the expression 
# y = m x + c, assuming normal observation errors for y:

# Simulate the data
N <- 100
X <- 1:N
Y <- rnorm(N, 2*X + 10, 1)

# Model in the JAGS format
model <- "model {
for(i in 1 : N){
	Y[i] ~ dnorm(true.y[i], precision)
	true.y[i] <- (m * X[i]) + c
}
m ~ dunif(-1000,1000)
c ~ dunif(-1000,1000)
precision ~ dexp(1)

#data# N, X, Y
}"


# Run the model using rjags with a 5 minute timeout:
results <- autorun.jags(model=model, max.time="5m",
monitor=c("m", "c", "precision"), method="rjags")

# Analyse traceplots of the results to assess convergence:
plot(results, type="trace", layout=c(3,1))

# Summary of monitored variables:
results

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples