Amwg: Turn a non-adaptive Metropolis sampler into an adaptive Metropolis sampler

Description

Given a non-adaptive sampler of the form f(..., s), Amwg will return a function g(...) that automatically adapts the Metropolis proposal standard deviation s to try and achieve a target acceptance rate.

Usage

Amwg(f, s, batch.size = 50, target = 0.44, DeltaN, stop.after = NA)

Arguments

non-adaptive Metropolis sampler of the form f(..., s)

initial value for the Metropolis proposal SD

batch.size

number of iterations before proposal SD is adapted

target

target acceptance rate

DeltaN

function of the form f(n) which returns the adaptation amount based on the number of elapsed iterations, n. Defaults to \(\delta(n) = min(0.01, n^{-1/2})\)

stop.after

stop adapting proposal SD after this many iterations

Value

Adaptive Metropolis sampler function of the form g(...).

Details

Amwg uses the Adaptive Metropolis-Within-Gibbs algorithm from Roberts & Rosenthal (2009), which re-scales the proposal standard deviation after a fixed number of MCMC iterations have elapsed. The goal of the algorithm is to achieve a target acceptance rate for the Metropolis step. After the nth batch of MCMC iterations the log of the proposal standard deviation, \(log(s)\), is increased/decreased by \(\delta(n)\). \(log(s)\) is increased by \(\delta(n)\) if the observed acceptance rate is more than the target acceptance rate, or decreased by \(\delta(n)\) if the observed acceptance rate is less than the target acceptance rate. Amwg keeps track of the the acceptance rate by comparing the previously sampled value from f to the next value. If the two values are equal, the proposal is considered to be rejected, whereas if the two values are different the proposal is considered accepted. Amwg will optionally stop adapting the proposal standard deviation after stop.after iterations. Setting stop.after can be used, for example, to stop adapting the proposal standard deviation after some burn-in period. If stop.after=NA (the default), Amwg will continue to modify the proposal standard deviation throughout the entire MCMC.

DeltaN is set to \(\delta(n) = min(0.01, n^{-1/2})\) unless re-specified in the function call. Some care should be taken if re-specifying DeltaN, as the ergodicity of the chain may not be preserved if certain conditions aren't met. See Roberts & Rosenthal (2009) in the references for details.

The proposal standard deviation s can be either a vector or a scalar. If the initial value of s is a scalar, \(f\) will be treated as a sampler for a scalar, a random vector, or a joint parameter update. Alternatively, if the dimension of \(s\) is equal to the dimension of the parameters returned by \(f\), the individual elements \(s\) will be treated as individual proposal standard deviations for the elements returned by \(f\). This functionality can be used, for example, if \(f\) samples each of its returned elements individually, updating each element using a Metropolis step. See the examples for an illustration of this use case. In such settings, \(f\) should be constructed to receive \(s\) as a vector argument.

References

Gareth O. Roberts & Jeffrey S. Rosenthal (2009) Examples of Adaptive MCMC, Journal of Computational and Graphical Statistics, 18:2, 349-367, 10.1198/jcgs.2009.06134

Examples

Run this code

# NOT RUN {
# Sample from N(1, 2^2) ---------------------------------------------------
LogP <- function(x) dnorm(x, 1, 2, log=TRUE) # Target distribution

f <- function(x, s) { # Non-adaptive Metropolis sampler
    x.prop <- x + rnorm(1, 0, s)
    if(AcceptProp(LogP(x), LogP(x.prop))) {
        x <- x.prop
    }

    return(x)
}

s.start <- 0.1
g <- Amwg(f, s.start, batch.size=25)

n.save <- 10000
Mcmc <- InitMcmc(n.save)
y <- 0
x <- 0
samples <- Mcmc({
    y <- f(y, s.start) # Non-adaptive
    x <- g(x) # Adaptive
})

plot(1:n.save, samples$x, ylim=c(-10, 10), main="Traceplots", xlab="Iteration",
     ylab="Value", type='l')
lines(1:n.save, samples$y, col="red")
legend("bottomleft", legend=c("Adaptive", "Non-adaptive"),
       col=c("black", "red"), lty=1, cex=0.8)

# Sample from Gamma(10, 5) ------------------------------------------------
LogP <- function(x) dgamma(x, 10, 5, log=TRUE) # Target distribution

f <- function(x, s) { # Non-adaptive Metropolis sampler
    x.prop <- x + rnorm(1, 0, s)
    if(AcceptProp(LogP(x), LogP(x.prop))) {
        x <- x.prop
    }

    return(x)
}

s.start <- 10
stop.after <- 5000 # Stop changing the proposal SD after 5000 iterations
g <- Amwg(f, s.start, batch.size=25, stop.after=stop.after)

n.save <- 10000
Mcmc <- InitMcmc(n.save)
x <- 1
samples <- Mcmc({
    x <- g(x)
})
hist(samples$x[stop.after:n.save,], xlab="x", main="Gamma(10, 5)", freq=FALSE)
curve(dgamma(x, 10, 5), from=0, to=max(samples$x), add=TRUE, col="blue")

# Overdispersed Poisson ---------------------------------------------------
## Likelihood:
## y_i|theta_i ~ Pois(theta_i), i=1,...,n
## Prior:
## theta_i ~ Log-Normal(mu, sigma^2)
## mu ~ Normal(m, v^2), m and v^2 fixed
## sigma^2 ~ InverseGamma(a, b), a and b fixed

# }
# NOT RUN {
SampleSigma2 <- function(theta.vec, mu, a, b, n.obs) {
    1/rgamma(1, a + n.obs/2, b + (1/2)*sum((log(theta.vec) - mu)^2))
}

SampleMu <- function(theta.vec, sigma.2, m, v.2, n.obs) {
    mu.var <- (1/v.2 + n.obs/sigma.2)^(-1)
    mu.mean <- (m/v.2 + sum(log(theta.vec))/sigma.2) * mu.var

    return(rnorm(1, mu.mean, sqrt(mu.var)))
}

LogDTheta <- function(theta, mu, sigma.2, y) {
    dlnorm(theta, mu, sqrt(sigma.2), log=TRUE) + dpois(y, theta, log=TRUE)
}

# Non-adaptive Metropolis sampler
SampleTheta <- function(theta.vec, mu, sigma.2, y.vec, n.obs, s) {
    theta.prop <- exp(log(theta.vec) + rnorm(n.obs, 0, s))

    # Jacobians, because proposals are made on the log scale
    j.curr <- log(theta.vec)
    j.prop <- log(theta.prop)

    log.curr <- LogDTheta(theta.vec, mu, sigma.2, y.vec) + j.curr
    log.prop <- LogDTheta(theta.prop, mu, sigma.2, y.vec) + j.prop
    theta.vec <- ifelse(AcceptProp(log.curr, log.prop), theta.prop, theta.vec)

    return(theta.vec)
}

## Data
y.vec <- warpbreaks$breaks
n.obs <- length(y.vec)

## Setup adaptive Metropolis sampler
s <- rep(1, n.obs)
# s is a vector, so the acceptance rate of each component will be tracked
# individually in the adaptive Metropolis sampler
SampleThetaAdapative <- Amwg(SampleTheta, s)

## Set prior
v.2 <- 0.05
m <- log(30) - v.2/2
a <- 1
b <- 2

## Initialize parameters
theta.vec <- y.vec
mu <- m

## MCMC
Mcmc <- InitMcmc(10000)
samples <- Mcmc({
    sigma.2 <- SampleSigma2(theta.vec, mu, a, b, n.obs)
    mu <- SampleMu(theta.vec, sigma.2, m, v.2, n.obs)
    theta.vec <- SampleThetaAdapative(theta.vec, mu, sigma.2, y.vec, n.obs)
})
# }

Run the code above in your browser using DataLab