run.jags: Run or Extend a User Specified Bayesian MCMC Model in JAGS from Within R

Description

Runs or extends a user specified JAGS model from within R, returning an object of class runjags-class. Running a JAGS model using these functions has two main advantages:

The method used to call the simulation can be changed with a single argument to the run.jags function. Options are to use a separate instance of a JAGS executable or a model compiled using the jags.model function, and to use a single model for all chains or multiple parallel threads to run chains simultaneously on parallel processors (either locally or via a distributed computing cluster). The methods most likely to be used are 'interruptible' (single separate JAGS instance), ('rjags' (single rjags model), 'parallel' (parallel JAGS instances), 'bgparallel' (parallel JAGS instances run in the background) and 'rjparallel' (parallel rjags models). For more details see below under the 'method' argument.

The second advantage is that several summary statistics and plots are automatically calculated, with associated S3 methods intended to facilitate checking model convergence and run length. See the help file for code{runjags-class} for more details.

Requires Just Another Gibbs Sampler (JAGS), see http://mcmc-jags.sourceforge.net.

Usage

run.jags(model, monitor = NA, data=NA, n.chains=NA, inits = NA, 
	burnin = 4000, sample = 10000, adapt=1000, 
	datalist=NA, initlist=NA, jags = runjags.getOption('jagspath'), 
	silent.jags = runjags.getOption('silent.jags'), summarise = TRUE, 
	confidence=0.95, plots = runjags.getOption('predraw.plots') && summarise, 
	psrf.target = 1.05, normalise.mcmc = TRUE, check.stochastic = TRUE, 
	modules=runjags.getOption('modules'), 
	factories=runjags.getOption('factories'), thin = 1,
	monitor.deviance = FALSE, monitor.pd = FALSE, monitor.pd.i = FALSE,
	monitor.popt = FALSE, check.conv = summarise, keep.jags.files =
	FALSE, tempdir=runjags.getOption('tempdir'), jags.refresh=0.1, 
	batch.jags=silent.jags, method=runjags.getOption('method'), 
	method.options=list())

extend.jags(runjags.object, add.monitor=character(0), 
	drop.monitor=character(0), drop.chain=numeric(0),
	combine=length(c(add.monitor,drop.monitor,drop.chain))==0, 
	burnin = 0, sample = 10000, adapt=1000, 
	jags = runjags.getOption('jagspath'), 
	silent.jags = runjags.getOption('silent.jags'), 
	summarise = TRUE, confidence=0.95, 
	plots = runjags.getOption('predraw.plots') && summarise, 
	psrf.target = 1.05, normalise.mcmc = TRUE, 
	check.stochastic = TRUE, thin = runjags.object$thin, 
	keep.jags.files	= FALSE, tempdir=runjags.getOption('tempdir'), 
	jags.refresh=0.1, batch.jags=silent.jags, method=NA, 
	method.options=NA)
results.jags(background.runjags.object)

Arguments

model

either a relative or absolute path to a textfile (including the file extension) containing a model in the JAGS language and possibly monitored variable names, data and/or initial values, or a character string of the same. No default. The model must be s

monitor

a character vector of the names of variables to monitor. The special node names 'deviance', 'pd', 'pd.i', 'popt' and 'dic' are used to monitor these model fit diagnostics (see the JAGS user manual for more information), but with the exception of 'devianc

data

a named list (or character string in the R dump format) containing the data. If left as NA, no external data is used in the model. A function (taking exactly 0 arguments) can also be provided, which will be evaluated by runjags to generate the data. De

n.chains

the number of chains to use with the simulation. More chains will improve the sensitivity of the convergence diagnostic, but will cause the simulation to run more slowly (although this may be improved by using a method such as 'parallel' or 'snow'). The

inits

either a character vector with length equal to the number of chains the model will be run using, or a list of named lists representing names and corresponding values of inits for each chain, or a function with either 1 argument representing the chain or

runjags.object

the model to be extended - the output of a run.jags (or autorun.jags or extend.jags etc) function, with class 'runjags'. No default.

background.runjags.object

the output of a run.jags (or extend.jags) function call using a background JAGS method, with class 'runjags.bginfo'. No default.

add.monitor

a character vector of variables to add to the monitored variable list. All previously monitored variables are automatically included - although see the 'drop.monitor' argument. Default no additional monitors.

drop.monitor

a character vector of previously monitored variables to remove from the monitored variable list for the extended model. Default none.

drop.chain

a numeric vector of chains to remove from the extended model. Default none.

combine

a logical flag indicating if results from the new JAGS run should be combined with the previous chains. Default TRUE if not adding or removing variables or chains, and FALSE otherwise.

burnin

the number of burnin iterations (not sampled) to use (numeric). Note that burnin occurs after adaptation, so the number of adapt iterations specified must also be considered. Default 4000 iterations (after adaptive phase).

sample

the number of sampling iterations to use (numeric). This must be an integer >=2 (although for extend.jags a value of 0 can be used if combine=TRUE - this allows summaries and plots to be calculated for stored runjags objects). Default 10000.

adapt

the length of the adaptive phase to use when compiling models. This will be run for every new simulation, although extending a model run with the rjags model does not require re-compilation (except when dropping chains). Note that the number of adaptive

datalist

an optional named list containing variables used as data, or alternatively a function (with no arguments) that returns a named list. If any variables are specified in the model block using '#data# variable', the value for the corresponding named variable

initlist

an optional named list containing variables used as initial values, or alternatively a function (with a single argument representing the chain number) that returns a named list. If any variables are specified in the model block using '#inits# variable',

jags

the system call or path for activating JAGS. Default calls findjags() to attempt to locate JAGS on your system.

silent.jags

should the JAGS output be suppressed? (logical) If TRUE, no indication of the progress of individual models is supplied. Note that output will still be produced by runjags even if silent.jags is set to TRUE - to suppress all output set silent.jags and si

summarise

should summary statistics be assessed after the model has completed? Default TRUE.

confidence

the prob argument to be passed to HPDinterval for calculation of confidence intervals. Default 0.95 (95% confidence intervals).

plots

should traceplots and density plots be pre-drawn by runjags to facilitate more convinient assessment of convergence after the model has finished running? If TRUE, the returned list will include elements 'trace' and 'density' which consist of a list of la

psrf.target

the value of the point estimate for the potential scale reduction factor of the Gelman Rubin statistic below which the chains are deemed to have converged (must be greater than 1). Ignored if check.conv==FALSE. Default 1.05.

normalise.mcmc

the Gelman Rubin statistic is based on the assumption that the posterior distribution of monitored variables is roughly normal. For very skewed posterior distributions, it may help to log/logit transform the posterior before calculating the Gelman Rubin

check.stochastic

non-stochastic monitored variables will cause errors when calculating the Gelman-Rubin statistic, if check.stochastic==TRUE then all monitored variables will be checked to ensure they are stochastic beforehand. This has a computational cost, and can be b

modules

external modules to be loaded into JAGS. More than 1 module can be used. Default none.

factories

factory modules to be loaded into JAGS. More than 1 factory can be used. Factories should be in the format '()', for example: factories='mix::TemperedMix(sampler)'. Default none.

thin

the thinning interval to be used in JAGS. Increasing the thinning interval may reduce autocorrelation, and therefore reduce the number of samples required, but will increase the time required to run the simulation. Default 1.

monitor.deviance

this argument is deprecated and remains for backwards compatibility only. See the 'monitor' variable.

monitor.pd

this argument is deprecated and remains for backwards compatibility only. See the 'monitor' variable.

monitor.pd.i

this argument is deprecated and remains for backwards compatibility only. See the 'monitor' variable.

monitor.popt

this argument is deprecated and remains for backwards compatibility only. See the 'monitor' variable.

check.conv

this argument is deprecated and remains for backwards compatibility only. See the 'summarise' variable.

keep.jags.files

option to keep the folder with files needed to call JAGS, rather than deleting it. May be useful for attempting to bug fix models. A character string can also provided, in which case this folder name will be used instead of the default (existing folders

tempdir

option to use the temporary directory as specified by the system rather than creating files in the working directory. If keep.jags.files==TRUE then the folder is copied to the working directory after the job has finished (with a unique folder name based

jags.refresh

the refresh interval (in seconds) for monitoring JAGS output using the 'interactive' and 'parallel' methods (see the 'method' argument). Longer refresh intervals will use less processor time. Default 0.1 seconds.

batch.jags

option to call JAGS in batch mode, rather than using input redirection. On JAGS >= 3.0.0, this suppresses output of the status which may be useful in some situations. Default TRUE if silent.jags is TRUE, or FALSE otherwise.

method

the method with which to call JAGS; probably a character vector specifying one of 'rjags', 'simple', 'interruptible', 'parallel', 'rjparallel', 'background', 'bgparallel' or 'snow' (and see also xgrid.extend

method.options

an optional named list of argument to be passed to the method function (including a user specified method function). Of the default arguments, only 'nsims' indicating the number of separate simulations (for parallel, snow and bgparallel methods) and 'cl'

Value

Usually an object of class 'runjags', or an object of class 'runjags.bginfo' for background methods (see runjags-class).

Examples

Run this code

# run a model to calculate the intercept and slope of the expression 
# y = m x + c, assuming normal observation errors for y:

# Simulate the data
X <- 1:100
Y <- rnorm(length(X), 2*X + 10, 1)

# Model in the JAGS format
model <- "model { 
for(i in 1 : N){ 
	Y[i] ~ dnorm(true.y[i], precision);
	true.y[i] <- (m * X[i]) + c
} 
m ~ dunif(-1000,1000)
c ~ dunif(-1000,1000) 
precision ~ dexp(1)
}"

# Data and initial values in a named list format, 
# with explicit control over the random number
# generator used for each chain (optional): 
data <- list(X=X, Y=Y, N=length(X))
inits1 <- list(m=1, c=1, precision=1,
.RNG.name="base::Super-Duper", .RNG.seed=1)
inits2 <- list(m=0.1, c=10, precision=1,
.RNG.name="base::Wichmann-Hill", .RNG.seed=2)

# Run the model and produce plots 
results <- run.jags(model=model, monitor=c("m", "c", "precision"), 
data=data, n.chains=2, method="rjags", inits=list(inits1,inits2), 
plots = TRUE)

# Plot the monitored variables:
plot(results)

# Look at the summary statistics:
print(results)

# Extract only the coefficient as an mcmc.list object:
coeff <- as.mcmc.list(results,vars="m")


# The same model but using embedded shortcuts to specify data, inits and monitors,
# and using parallel chains:

# Model in the JAGS format

model <- "model {
for(i in 1 : N){ #data# N
	Y[i] ~ dnorm(true.y[i], precision) #data# Y
	true.y[i] <- (m * X[i]) + c #data# X
}
m ~ dunif(-1000,1000) #inits# m
c ~ dunif(-1000,1000)
precision ~ dexp(1)
#monitor# m, c, precision
}"

# Simulate the data
X <- 1:100
Y <- rnorm(length(X), 2*X + 10, 1)
N <- length(X)

initfunction <- function(chain) return(switch(chain, 
	"1"=list(m=-10), "2"=list(m=10)))

# Run the 2 chains in parallel (allowing the run.jags function
# to control the set up of the distributed computing cluster):
results <- run.jags(model, n.chains=2, initlist=initfunction,
method="rjparallel")

# Look at a traceplot of the intercept and slope on a 2x1 grid:
plot(results,type="trace",vars=c("m","^c"),layout=c(2,1))

# Run the same model using 8 chains and a user-specified
# distributed computing cluster:

# A list of 8 over-dispersed starting values for m:
m <- as.list(runif(8,-20,20))

# Set up a distributed computing cluster with 4 nodes:
library(parallel)
cl <- makeCluster(4)

# Run the chains in parallel (run.jags will use 4 models 
# in parallel with 2 chains each):
results <- run.jags(model, n.chains=8, initlist=initfunction,
method="rjparallel", method.options=list(cl=cl))

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples