Build a Laplace approximation algorithm for a given NIMBLE model.
buildLaplace(
model,
paramNodes,
randomEffectsNodes,
calcNodes,
calcNodesOther,
control = list()
)
a NIMBLE model object, such as returned by nimbleModel
.
The model must have automatic derivatives (AD) turned on, e.g. by using
buildDerivs=TRUE
in nimbleModel
.
a character vector of names of parameter nodes in the
model; defaults are provided by setupMargNodes
.
Alternatively, paramNodes
can be a list in the format returned by
setupMargNodes
, in which case randomEffectsNodes
,
calcNodes
, and calcNodesOther
are not needed (and will be
ignored).
a character vector of names of continuous unobserved
(latent) nodes to marginalize (integrate) over using Laplace approximation;
defaults are provided by setupMargNodes
.
a character vector of names of nodes for calculating the
integrand for Laplace approximation; defaults are provided by
setupMargNodes
. There may be deterministic nodes between
paramNodes
and calcNodes
. These will be included in
calculations automatically and thus do not need to be included in
calcNodes
(but there is no problem if they are).
a character vector of names of nodes for calculating
terms in the log-likelihood that do not depend on any
randomEffectsNodes
, and thus are not part of the marginalization,
but should be included for purposes of finding the MLE. This defaults to
stochastic nodes that depend on paramNodes
but are not part of and
do not depend on randomEffectsNodes
. There may be deterministic
nodes between paramNodes
and calcNodesOther
. These will be
included in calculations automatically and thus do not need to be included
in calcNodesOther
(but there is no problem if they are).
a named list for providing additional settings used in Laplace
approximation. See control
section below.
buildLaplace
is the main function for constructing the Laplace
approximation for a given model or part of a model.
See method summary
below and the separation function
summaryLaplace
for processing maximum likelihood estimates
obtained by method findMLE
below.
Any of the input node vectors, when provided, will be processed using
nodes <- model$expandNodeNames(nodes)
, where nodes
may be
paramNodes
, randomEffectsNodes
, and so on. This step allows
any of the inputs to include node-name-like syntax that might contain
multiple nodes. For example, paramNodes = 'beta[1:10]'
can be
provided if there are actually 10 scalar parameters, 'beta[1]' through
'beta[10]'. The actual node names in the model will be determined by the
exapndNodeNames
step.
In many (but not all) cases, one only needs to provide a NIMBLE model object
and then the function will construct reasonable defaults necessary for
Laplace approximation to marginalize over all continuous latent states
(aka random effects) in a model. The default values for the four groups of
nodes are obtained by calling setupMargNodes
, whose arguments
match those here (except for a few arguments which are taken from control
list elements here).
setupMargNodes
tries to give sensible defaults from
any combination of paramNodes
, randomEffectsNodes
,
calcNodes
, and calcNodesOther
that are provided. For example,
if you provide only randomEffectsNodes
(perhaps you want to
marginalize over only some of the random effects in your model),
setupMargNodes
will try to determine appropriate choices for the
others.
These defaults make general assumptions such as that
randomEffectsNodes
have paramNodes
as parents. However, The
steps for determining defaults are not simple, and it is possible that they
will be refined in the future. It is also possible that they simply don't
give what you want for a particular model. One example where they will not
give desired results can occur when random effects have no prior
parameters, such as `N(0,1)` nodes that will be multiplied by a scale
factor (e.g. sigma) and added to other explanatory terms in a model. Such
nodes look like top-level parameters in terms of model structure, so
you must provide a randomEffectsNodes
argument to indicate which
they are.
It can be helpful to use setupMargNodes
directly to see exactly how
nodes will be arranged for Laplace approximation. For example, you may want
to verify the choice of randomEffectsNodes
or get the order of
parameters it has established to use for making sense of the MLE and
results from the summary
method. One can also call
setupMargNodes
, customize the returned list, and then provide that
to buildLaplace
as paramNodes
. In that case,
setupMargNodes
will not be called (again) by buildLaplace
.
If setupMargNodes
is emitting an unnecessary warning, simply use
control=list(check=FALSE)
.
If any paramNodes
(parameters) or randomEffectsNodes
(random
effects / latent states) have constraints on the range of valid values
(because of the distribution they follow), they will be used on a
transformed scale determined by parameterTransform
. This means the
Laplace approximation itself will be done on the transformed scale for
random effects and finding the MLE will be done on the transformed scale
for parameters. For parameters, prior distributions are not included in
calculations, but they are used to determine valid parameter ranges. For
example, if sigma
is a standard deviation, you can declare it with a
prior such as sigma ~ dhalfflat()
to indicate that it must be
greater than 0.
For default determination of parameters, all parameters must have a prior
distribution simply to indicate the range of valid values. For a param
p
that has no constraint, a simple choice is p ~ dflat()
.
The object returned by buildLaplace
is a nimbleFunction object with
numerous methods (functions). The most useful ones are:
calcLogLik(p, trans)
. Calculate the Laplace approximation to
the marginal log-likelihood function at parameter value p
, which
(if trans
is FALSE, which is the default) should match the order
of paramNodes
. For any non-scalar nodes in paramNodes
,
the order within the node is column-major (which can be seen for R
objects using as.numeric
). Return value is the scalar
(approximate, marginal) log likelihood.
If trans
is TRUE, then p
is the vector of parameters on
the transformed scale, if any, described above. In this case, the
parameters on the original scale (as the model was written) will be
determined by calling the method pInverseTransform(p)
. Note that
the length of the parameter vector on the transformed scale might not
be the same as on the original scale (because some constraints of
non-scalar parameters result in fewer free transformed parameters than
original parameters).
calcLaplace(p, trans)
. This is the same as calcLogLik
.
findMLE(pStart, method, hessian)
. Find the maximum likelihood
estimates of parameters using the Laplace-approximated marginal
likelihood. Arguments include pStart
: initial parameter values
(defaults to parameter values currently in the model);
method
: (outer) optimization method to use in optim
(defaults to "BFGS"); and
hessian
: whether to calculate and return the Hessian matrix
(defaults to TRUE
). Second derivatives in the Hessian are
determined by finite differences of the gradients obtained by
automatic differentiation (AD). Return value is a nimbleList of type
optimResultNimbleList
, similar to what is returned by R's
optim. See help(nimOptim)
.
summary(MLEoutput, originalScale, randomEffectsStdError,
jointCovariance)
. Summarize the maximum likelihood estimation
results, given object MLEoutput
that was returned by
findMLE
. The summary can include a covariance matrix for the
parameters, the random effects, or both),
and these can be returned on the original parameter scale or on the
(potentially) transformed scale(s) used in estimation.
In more detail, summary
accepts the following optional arguments:
originalScale
. Logical. If TRUE, the function returns
results on the original scale(s) of parameters and random effects;
otherwise, it returns results on the transformed scale(s). If there
are no constraints, the two scales are identical. Defaults to TRUE.
randomEffectsStdError
. Logical. If TRUE, standard
errors of random effects will be calculated.
Defaults to FALSE.
jointCovariance
. Logical. If TRUE, the joint
variance-covariance matrix of the parameters and the random effects
will be returned. If FALSE, the variance-covariance matrix of the
parameters will be returned. Defaults to FALSE.
The object returned by summary
is a nimbleList with elements:
params
. A list that contains estimates and standard
errors of parameters (on the original or transformed scale, as
chosen by originalScale
).
randomEffects
. A list that contains estimates of random
effects and, if requested (randomEffectsStdError=TRUE
)
their standard errors, on original or transformed scale. Standard
errors are calculated following the generalized delta method of
Kass and Steffey (1989).
vcov
. If requested (i.e.
jointCovariance=TRUE
), the joint variance-covariance
matrix of the parameters and random effects, on original or
transformed scale. If jointCovariance=FALSE
, the
covariance matrix of the parameters, on original or transformed
scale.
scale
. "original"
or "transformed"
, the
scale on which results were requested.
Additional methods to access or control more details of the Laplace approximation include:
getNodeNamesVec(returnParams)
. Return a vector (>1) of names
of parameters/random effects nodes, according to returnParams =
TRUE/FALSE
. Use this if there is more than one node.
getNodeNameSingle(returnParams)
. Return the name of a
single parameter/random effect node, according to returnParams =
TRUE/FALSE
. Use this if there is only one node.
setMethod(method)
. Set method ID for calculating the Laplace
approximation and gradient: 1 (Laplace1
), 2 (Laplace2
,
default method), or 3 (Laplace3
). See below for more details. Users
wanting to explore efficiency can try switching from method 2 (default) to
methods 1 or 3 and comparing performance. The first Laplace approximation
with each method will be (much) slower than subsequent Laplace
approximations.
getMethod()
. Return the current method ID for Laplace.
gr_logLik(p, trans)
. Gradient of the Laplace-approximated
marginal log-likelihood at parameter value p
. Argument trans
is similar to that in calcLaplace
. If there are multiple parameters,
the vector p
is given in the order of parameter names returned by
getNodeNamesVec(returnParams=TRUE)
.
gr_Laplace(p, trans)
. This is the same as gr_logLik
.
otherLogLik(p)
. Calculate the calcNodesOther
nodes, which returns the log-likelihood of the parts of the model that are
not included in the Laplace approximation.
gr_otherLogLik(p)
. Gradient (vector of derivatives with
respect to each parameter) of otherLogLik(p)
. Results should
match gr_otherLogLik_internal(p)
but may be more efficient after
the first call.
Finally, methods that are primarily for internal use by other methods include:
gr_logLik_pTransformed
. Gradient of the Laplace
approximation (calcLogLik_pTransformed(pTransform)
) at transformed
(unconstrained) parameter value pTransform
.
pInverseTransform(pTransform)
. Back-transform the transformed
parameter value pTransform
to original scale.
derivs_pInverseTransform(pTransform, order)
. Derivatives of
the back-transformation (i.e. inverse of parameter transformation) with
respect to transformed parameters at pTransform
. Derivative order
is given by order
(any of 0, 1, and/or 2).
reInverseTransform(reTrans)
. Back-transform the transformed
random effects value reTrans
to original scale.
derivs_reInverseTransform(reTrans, order)
. Derivatives of the
back-transformation (i.e. inverse of random effects transformation) with
respect to transformed random effects at reTrans
. Derivative order
is given by order
(any of 0, 1, and/or 2).
optimRandomEffects(pTransform)
. Calculate the optimized
random effects given transformed parameter value pTransform
. The
optimized random effects are the mode of the conditional distribution of
random effects given data at parameters pTransform
, i.e. the
calculation of calcNodes
.
inverse_negHess(p, reTransform)
. Calculate the inverse of the
negative Hessian matrix of the joint (parameters and random effects)
log-likelihood with respect to transformed random effects, evaluated at
parameter value p
and transformed random effects
reTransform
.
hess_logLik_wrt_p_wrt_re(p, reTransform)
. Calculate the
Hessian matrix of the joint log-likelihood with respect to parameters and
transformed random effects, evaluated at parameter value p
and
transformed random effects reTransform
.
one_time_fixes()
. Users never need to run this. Is is called
when necessary internally to fix dimensionality issues if there is only
one parameter in the model.
calcLogLik_pTransformed(pTransform)
. Laplace approximation at
transformed (unconstrained) parameter value pTransform
. To
make maximizing the Laplace likelihood unconstrained, an automated
transformation via parameterTransform
is performed on
any parameters with constraints indicated by their priors (even
though the prior probabilities are not used).
gr_otherLogLik_internal(p)
. Gradient (vector of
derivatives with respect to each parameter) of otherLogLik(p)
.
This is obtained using automatic differentiation (AD) with single-taping.
First call will always be slower than later calls.
buildLaplace
accepts the following control list elements:
split
. If TRUE (default), randomEffectsNodes
will be
split into conditionally independent sets if possible. This
facilitates more efficient Laplace approximation because each
conditionally independent set can be marginalized independently. If
FALSE, randomEffectsNodes
will be handled as one multivariate
block, with one multivariate Laplace approximation. If split
is a numeric vector, randomEffectsNodes
will be split by
split
(randomEffectsNodes
, control$split
). The
last option allows arbitrary control over how
randomEffectsNodes
are blocked.
check
. If TRUE (default), a warning is issued if
paramNodes
, randomEffectsNodes
and/or calcNodes
are provided but seek to have missing elements or unnecessary
elements based on some default inspection of the model. If
unnecessary warnings are emitted, simply set check=FALSE
.
innerOptimControl
. A list of control parameters for the inner
optimization of Laplace approximation using optim
. See
'Details' of optim
for further information.
innerOptimMethod
. Optimization method to be used in
optim
for the inner optimization. See 'Details' of
optim
. Currently optim
in NIMBLE supports:
"Nelder-Mead
", "BFGS
", "CG
", and
"L-BFGS-B
". By default, method "CG
" is used when
marginalizing over a single (scalar) random effect, and "BFGS
"
is used for multiple random effects being jointly marginalized over.
innerOptimStart
. Choice of starting values for the inner
optimization. This could be "last"
, "last.best"
, or a
vector of user provided values. "last"
means the most recent
random effects values left in the model will be used. When finding
the MLE, the most recent values will be the result of the most recent
inner optimization for Laplace. "last.best"
means the random
effects values corresponding to the largest Laplace likelihood (from
any call to the calcLaplace
or calcLogLik
method,
including during an MLE search) will be used (even if it was not the
most recent Laplace likelihood). By default, the initial random
effects values will be used for inner optimization.
outOptimControl
. A list of control parameters for maximizing
the Laplace log-likelihood using optim
. See 'Details' of
optim
for further information.
Note that there are two numerical optimizations in the Laplace approximation
algorithm: (1) maximizing the joint log-likelihood of random effects and data
given a parameter value to construct the Laplace approximation to the marginal
log-likelihood at the given parameter value; (2) maximizing the Laplace
approximation to the marginal log-likelihood (i.e. calcLaplace
) to find
the MLEs of model parameters. In the control
list above, the prefix
'inner' refers to optimization (1) and 'out' refers to optimization (2).
Currently both optimizations use the optimizer optim
. However, one
can easily turn to other optimizers (say nlminb
) in R for
optimization (2); see the example below.
Wei Zhang, Perry de Valpine
Kass, R. and Steffey, D. (1989). Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical Bayes models). Journal of the American Statistical Association, 84(407), 717-726.
Skaug, H. and Fournier, D. (2006). Automatic approximation of the marginal likelihood in non-Gaussian hierarchical models. Computational Statistics & Data Analysis, 56, 699-709.
pumpCode <- nimbleCode({
for (i in 1:N){
theta[i] ~ dgamma(alpha, beta)
lambda[i] <- theta[i] * t[i]
x[i] ~ dpois(lambda[i])
}
alpha ~ dexp(1.0)
beta ~ dgamma(0.1, 1.0)
})
pumpConsts <- list(N = 10, t = c(94.3, 15.7, 62.9, 126, 5.24, 31.4, 1.05, 1.05, 2.1, 10.5))
pumpData <- list(x = c(5, 1, 5, 14, 3, 19, 1, 1, 4, 22))
pumpInits <- list(alpha = 0.1, beta = 0.1, theta = rep(0.1, pumpConsts$N))
pump <- nimbleModel(code = pumpCode, name = "pump", constants = pumpConsts,
data = pumpData, inits = pumpInits, buildDerivs = TRUE)
# Build Laplace approximation
pumpLaplace <- buildLaplace(pump)
if (FALSE) {
# Compile the model
Cpump <- compileNimble(pump)
CpumpLaplace <- compileNimble(pumpLaplace, project = pump)
# Calculate MLEs of parameters
MLEres <- CpumpLaplace$findMLE()
# Calculate estimates and standard errors for parameters and random effects on original scale
allres <- CpumpLaplace$summary(MLEres, randomEffectsStdError = TRUE)
# Use nlminb to find MLEs
MLEres.nlminb <- nlminb(c(0.1, 0.1),
function(x) -CpumpLaplace$calcLaplace(x),
function(x) -CpumpLaplace$gr_Laplace(x))
}
Run the code above in your browser using DataLab