step_size based on log_accept_prob.The dual averaging policy uses a noisy step size for exploration, while
averaging over tuning steps to provide a smoothed estimate of an optimal
value. It is based on section 3.2 of Hoffman and Gelman (2013), which
modifies the [stochastic convex optimization scheme of Nesterov (2009).
The modified algorithm applies extra weight to recent iterations while
keeping the convergence guarantees of Robbins-Monro, and takes care not
to make the step size too small too quickly when maintaining a constant
trajectory length, to avoid expensive early iterations. A good target
acceptance probability depends on the inner kernel. If this kernel is
HamiltonianMonteCarlo, then 0.6-0.9 is a good range to aim for. For
RandomWalkMetropolis this should be closer to 0.25. See the individual
kernels' docstrings for guidance.
mcmc_dual_averaging_step_size_adaptation(
inner_kernel,
num_adaptation_steps,
target_accept_prob = 0.75,
exploration_shrinkage = 0.05,
step_count_smoothing = 10,
decay_rate = 0.75,
step_size_setter_fn = NULL,
step_size_getter_fn = NULL,
log_accept_prob_getter_fn = NULL,
validate_args = FALSE,
name = NULL
)a Monte Carlo sampling kernel
TransitionKernel-like object.
Scalar integer Tensor number of initial steps to
during which to adjust the step size. This may be greater, less than, or
equal to the number of burnin steps.
A floating point Tensor representing desired
acceptance probability. Must be a positive number less than 1. This can
either be a scalar, or have shape [num_chains]. Default value: 0.75
(the center of asymptotically optimal rate for HMC).
Floating point scalar Tensor. How strongly the
exploration rate is biased towards the shrinkage target.
Int32 scalar Tensor. Number of "pseudo-steps"
added to the number of steps taken to prevents noisy exploration during
the early samples.
Floating point scalar Tensor. How much to favor recent
iterations over earlier ones. A value of 1 gives equal weight to all
history.
A function with the signature
(kernel_results, new_step_size) -> new_kernel_results where kernel_results are the
results of the inner_kernel, new_step_size is a Tensor or a nested
collection of Tensors with the same structure as returned by the
step_size_getter_fn, and new_kernel_results are a copy of
kernel_results with the step size(s) set.
A callable with the signature
(kernel_results) -> step_size where kernel_results are the results of the inner_kernel,
and step_size is a floating point Tensor or a nested collection of
such Tensors.
A callable with the signature
(kernel_results) -> log_accept_prob where kernel_results are the results of the
inner_kernel, and log_accept_prob is a floating point Tensor.
log_accept_prob can either be a scalar, or have shape [num_chains]. If
it's the latter, step_size should also have the same leading
dimension.
logical. When TRUE kernel parameters are checked
for validity. When FALSE invalid inputs may silently render incorrect
outputs.
name prefixed to Ops created by this function.
Default value: NULL (i.e., 'dual_averaging_step_size_adaptation').
In general, adaptation prevents the chain from reaching a stationary
distribution, so obtaining consistent samples requires num_adaptation_steps
be set to a value somewhat smaller than the number of burnin steps.
However, it may sometimes be helpful to set num_adaptation_steps to a larger
value during development in order to inspect the behavior of the chain during
adaptation.
The step size is assumed to broadcast with the chain state, potentially having
leading dimensions corresponding to multiple chains. When there are fewer of
those leading dimensions than there are chain dimensions, the corresponding
dimensions in the log_accept_prob are averaged (in the direct space, rather
than the log space) before being used to adjust the step size. This means that
this kernel can do both cross-chain adaptation, or per-chain step size
adaptation, depending on the shape of the step size.
For example, if your problem has a state with shape [S], your chain state
has shape [C0, C1, S] (meaning that there are C0 * C1 total chains) and
log_accept_prob has shape [C0, C1] (one acceptance probability per chain),
then depending on the shape of the step size, the following will happen:
Step size has shape [], [S] or [1], the log_accept_prob will be averaged
across its C0 and C1 dimensions. This means that you will learn a shared
step size based on the mean acceptance probability across all chains. This
can be useful if you don't have a lot of steps to adapt and want to average
away the noise.
Step size has shape [C1, 1] or [C1, S], the log_accept_prob will be
averaged across its C0 dimension. This means that you will learn a shared
step size based on the mean acceptance probability across chains that share
the coordinate across the C1 dimension. This can be useful when the C1
dimension indexes different distributions, while C0 indexes replicas of a
single distribution, all sampled in parallel.
Step size has shape [C0, C1, 1] or [C0, C1, S], then no averaging will
happen. This means that each chain will learn its own step size. This can be
useful when all chains are sampling from different distributions. Even when
all chains are for the same distribution, this can help during the initial
warmup period.
Step size has shape [C0, 1, 1] or [C0, 1, S], the log_accept_prob will be
averaged across its C1 dimension. This means that you will learn a shared
step size based on the mean acceptance probability across chains that share
the coordinate across the C0 dimension. This can be useful when the C0
dimension indexes different distributions, while C1 indexes replicas of a
single distribution, all sampled in parallel.
For an example how to use see mcmc_no_u_turn_sampler().
Other mcmc_kernels:
mcmc_hamiltonian_monte_carlo(),
mcmc_metropolis_adjusted_langevin_algorithm(),
mcmc_metropolis_hastings(),
mcmc_no_u_turn_sampler(),
mcmc_random_walk_metropolis(),
mcmc_replica_exchange_mc(),
mcmc_simple_step_size_adaptation(),
mcmc_slice_sampler(),
mcmc_transformed_transition_kernel(),
mcmc_uncalibrated_hamiltonian_monte_carlo(),
mcmc_uncalibrated_langevin(),
mcmc_uncalibrated_random_walk()