InitMcmc: Initialize a Markov chain Monte Carlo run

Description

Eliminates much of the "boilerplate" code needed for MCMC implementations by looping through the samplers and saving the resulting draws automatically.

Usage

InitMcmc(n.save, backing.path = NA, thin = 1, exclude = NULL,
  overwrite = FALSE)

Arguments

n.save

number of samples to take. If thin=1, the number of iterations to run the MCMC chain

backing.path

NA to save the samples in-memory, otherwise directory path where MCMC samples will be saved

thin

thinning interval

exclude

character vector specifying variables that should not be saved

overwrite

TRUE/FALSE indicating whether previous MCMC results should be overwritten

Value

A function that returns a list of either matrix or big.matrix with the MCMC samples. Each row in the matrices corresponds to one sample from the MCMC chain.

Details

InitMcmc returns a function that takes an R expression. The returned function automatically loops through the R expression and saves any numeric assignments, typically MCMC samples, that are made within it. exclude specifies assignments that should not be saved. When exclude is NULL, all the numeric assignments (scalar, vector, matrix, or array) are saved. The dimensions of matrix and array assignments are not preserved; they are flattened into vectors before saving. Non-numeric assignments are not saved.

The number of iterations for the MCMC chain is determined by n.save and thin. The desired number of samples to be saved from the target distribution is set by n.save, and the chain is thinned according to the interval set by thin. The MCMC chain will run for n.save \(x\) thin iterations.

The MCMC samples can be saved either in-memory or on-disk. Unlike saving in-memory, saving on-disk is not constrained by available RAM. Saving on-disk can be used in high-dimensional settings where running multiple MCMC chains in parallel and saving the results in-memory would use up all available RAM. File-backed saving uses big.matrix, and the behaviors of that implementation apply when saving on-disk. In particular, big.matrix has call-by-reference rather than call-by-value behavior, so care must be taken not to introduce unintended side-effects when modifying these objects. In-memory saving is implemented via matrix and has standard R behavior.

When backing.path is NA, samples will be saved in-memory. To save samples on-disk, backing.path should specify the path to the directory where the MCMC samples should be saved. The big.matrix backingfiles will be saved in that directory, with filenames corresponding to the variable assignment names made in the R expression. Consequently, the assignment names in the R expression must be chosen in such a way that they are compatible as filenames on the operating system. The big.matrix descriptorfiles are also named according to the variable assignment names made in the R expression, but with a ".desc" suffix.

By default, InitMcmc will not overwrite the results from a previous file-backed MCMC. This behavior can be overridden by specifying overwrite=TRUE in InitMcmc, or as the second argument to the function returned by InitMcmc. See the examples for an illustration. overwrite is ignored for in-memory MCMC.

Examples

Run this code

# NOT RUN {
# Beta-binomial -----------------------------------------------------------
## Likelihood:
## x|theta ~ Binomial(n, theta)
## Prior:
## theta ~ Unif(0, 1)

theta.truth <- 0.75
n.obs <- 100
x <- rbinom(1, n.obs, prob=theta.truth)

# Sampling function
SampleTheta <- function() {
    rbeta(1, 1 + x, 1 + n.obs - x)
}

# MCMC
Mcmc <- InitMcmc(1000)
samples <- Mcmc({
    theta <- SampleTheta()
})

# Plot posterior distribution
hist(samples$theta, freq=FALSE, main="Posterior", xlab=expression(theta))
theta.grid <- seq(min(samples$theta), max(samples$theta), length.out=500)
lines(theta.grid, dbeta(theta.grid, 1 + x, 1 + n.obs - x), col="blue")
abline(v=theta.truth, col="red")
legend("topleft", legend=c("Analytic posterior", "Simulation truth"),
       lty=1, col=c("blue", "red"), cex=0.75)

# Estimating mean with unknown variance -----------------------------------
## Likelihood:
## x|mu, sigma^2 ~ N(mu, sigma^2)
## Prior:
## p(mu) \propto 1
## p(sigma^2) \propto 1/sigma^2

# Simulated data
mu.truth <- 10
sigma.2.truth <- 2
n.obs <- 100
x <- rnorm(n.obs, mu.truth, sqrt(sigma.2.truth))
x.bar <- mean(x)

# Sampling functions
SampleMu <- function(sigma.2) {
    rnorm(1, x.bar, sqrt(sigma.2/n.obs))
}

SampleSigma2 <- function(mu) {
    1/rgamma(1, n.obs/2, (1/2)*sum((x-mu)^2))
}

# MCMC
Mcmc <- InitMcmc(1000, thin=10, exclude="sigma.2")
sigma.2 <- 1 # Initialize parameter
samples <- Mcmc({
    mu <- SampleMu(sigma.2)
    sigma.2 <- SampleSigma2(mu)
})

# Plot posterior distribution
hist(samples$mu, xlab=expression(mu), main="Posterior")
abline(v=mu.truth, col="red")
legend("topleft", legend="Simulation truth", lty=1, col="red", cex=0.75)

# sigma.2 is excluded from saved samples
is.null(samples$sigma.2)

# Linear regression -------------------------------------------------------
## Likelihood:
## y|beta, sigma^2, x ~ N(x %*% beta, sigma^2 * I)
## Prior:
## p(beta, sigma^2|x) \propto 1/sigma^2

# Simulated data
n.obs <- 100
x <- matrix(NA, nrow=n.obs, ncol=3)
x[,1] <- 1
x[,2] <- rnorm(n.obs)
x[,3] <- x[,2] + rnorm(n.obs)
beta.truth <- c(1, 2, 3)
sigma.2.truth <- 5
y <- rnorm(n.obs, x %*% beta.truth, sqrt(sigma.2.truth))

# Calculations for drawing beta
l.mod <- lm(y ~ x - 1)
beta.hat <- l.mod$coefficients
xtx.inv <- summary(l.mod)$cov.unscaled
xtx.inv.chol <- chol(xtx.inv)

# Calculations for drawing sigma.2
a.sigma.2 <- (n.obs - length(beta.hat))/2
b.sigma.2 <- (1/2) * t(y - x %*% beta.hat) %*% (y - x %*% beta.hat)

# Draw from multivariate normal
Rmvn <- function(mu, sigma.chol) {
    d <- length(mu)
    c(mu + t(sigma.chol) %*% rnorm(d))
}

SampleBeta <- function(sigma.2) {
    Rmvn(beta.hat, xtx.inv.chol * sqrt(sigma.2))
}

SampleSigma2 <- function() {
    1/rgamma(1, a.sigma.2, b.sigma.2)
}

# MCMC, samples saved on-disk
backing.path <- tempfile()
dir.create(backing.path)
Mcmc <- InitMcmc(1000, backing.path=backing.path)
samples <- Mcmc({
    sigma.2 <- SampleSigma2()
    beta <- SampleBeta(sigma.2)
})

# Plot residuals using predictions made from the posterior mean of beta
y.hat <- x %*% colMeans(samples$beta[,])
plot(y.hat, y-y.hat, xlab="Predicted", ylab="Residual")
abline(h=0, col="red")

# Overwrite previous results ----------------------------------------------
### Overwrite specified in InitMcmc
backing.path <- tempfile()
dir.create(backing.path)
Mcmc <- InitMcmc(5, backing.path=backing.path, overwrite=TRUE)
samples <- Mcmc({
    x <- 1
})
samples <- Mcmc({
    x <- 2
})
samples$x[,]

### Overwrite specified in the function returned by InitMcmc
backing.path <- tempfile()
dir.create(backing.path)
Mcmc <- InitMcmc(5, backing.path=backing.path, overwrite=FALSE)
samples <- Mcmc({
    x <- 3
})
samples <- Mcmc({
    x <- 4
}, overwrite=TRUE)
samples$x[,]
# }