buildLaplace: Laplace approximation

Description

Build a Laplace approximation algorithm for a given NIMBLE model.

Usage

buildLaplace(
  model,
  paramNodes,
  randomEffectsNodes,
  calcNodes,
  calcNodesOther,
  control = list()
)

Arguments

model: a NIMBLE model object, such as returned by nimbleModel. The model must have automatic derivatives (AD) turned on, e.g. by using buildDerivs=TRUE in nimbleModel.
paramNodes: a character vector of names of parameter nodes in the model; defaults are provided by setupMargNodes. Alternatively, paramNodes can be a list in the format returned by setupMargNodes, in which case randomEffectsNodes, calcNodes, and calcNodesOther are not needed (and will be ignored).
randomEffectsNodes: a character vector of names of continuous unobserved (latent) nodes to marginalize (integrate) over using Laplace approximation; defaults are provided by setupMargNodes.
calcNodes: a character vector of names of nodes for calculating the integrand for Laplace approximation; defaults are provided by setupMargNodes. There may be deterministic nodes between paramNodes and calcNodes. These will be included in calculations automatically and thus do not need to be included in calcNodes (but there is no problem if they are).
calcNodesOther: a character vector of names of nodes for calculating terms in the log-likelihood that do not depend on any randomEffectsNodes, and thus are not part of the marginalization, but should be included for purposes of finding the MLE. This defaults to stochastic nodes that depend on paramNodes but are not part of and do not depend on randomEffectsNodes. There may be deterministic nodes between paramNodes and calcNodesOther. These will be included in calculations automatically and thus do not need to be included in calcNodesOther (but there is no problem if they are).
control: a named list for providing additional settings used in Laplace approximation. See control section below.

<code>buildLaplace</code>

buildLaplace is the main function for constructing the Laplace approximation for a given model or part of a model.

See method summary below and the separation function summaryLaplace for processing maximum likelihood estimates obtained by method findMLE below.

Any of the input node vectors, when provided, will be processed using nodes <- model$expandNodeNames(nodes), where nodes may be paramNodes, randomEffectsNodes, and so on. This step allows any of the inputs to include node-name-like syntax that might contain multiple nodes. For example, paramNodes = 'beta[1:10]' can be provided if there are actually 10 scalar parameters, 'beta[1]' through 'beta[10]'. The actual node names in the model will be determined by the exapndNodeNames step.

In many (but not all) cases, one only needs to provide a NIMBLE model object and then the function will construct reasonable defaults necessary for Laplace approximation to marginalize over all continuous latent states (aka random effects) in a model. The default values for the four groups of nodes are obtained by calling setupMargNodes, whose arguments match those here (except for a few arguments which are taken from control list elements here).

setupMargNodes tries to give sensible defaults from any combination of paramNodes, randomEffectsNodes, calcNodes, and calcNodesOther that are provided. For example, if you provide only randomEffectsNodes (perhaps you want to marginalize over only some of the random effects in your model), setupMargNodes will try to determine appropriate choices for the others.

These defaults make general assumptions such as that randomEffectsNodes have paramNodes as parents. However, The steps for determining defaults are not simple, and it is possible that they will be refined in the future. It is also possible that they simply don't give what you want for a particular model. One example where they will not give desired results can occur when random effects have no prior parameters, such as `N(0,1)` nodes that will be multiplied by a scale factor (e.g. sigma) and added to other explanatory terms in a model. Such nodes look like top-level parameters in terms of model structure, so you must provide a randomEffectsNodes argument to indicate which they are.

It can be helpful to use setupMargNodes directly to see exactly how nodes will be arranged for Laplace approximation. For example, you may want to verify the choice of randomEffectsNodes or get the order of parameters it has established to use for making sense of the MLE and results from the summary method. One can also call setupMargNodes, customize the returned list, and then provide that to buildLaplace as paramNodes. In that case, setupMargNodes will not be called (again) by buildLaplace.

If setupMargNodes is emitting an unnecessary warning, simply use control=list(check=FALSE).

If any paramNodes (parameters) or randomEffectsNodes (random effects / latent states) have constraints on the range of valid values (because of the distribution they follow), they will be used on a transformed scale determined by parameterTransform. This means the Laplace approximation itself will be done on the transformed scale for random effects and finding the MLE will be done on the transformed scale for parameters. For parameters, prior distributions are not included in calculations, but they are used to determine valid parameter ranges. For example, if sigma is a standard deviation, you can declare it with a prior such as sigma ~ dhalfflat() to indicate that it must be greater than 0.

For default determination of parameters, all parameters must have a prior distribution simply to indicate the range of valid values. For a param p that has no constraint, a simple choice is p ~ dflat().

The object returned by buildLaplace is a nimbleFunction object with numerous methods (functions). The most useful ones are:

calcLogLik(p, trans). Calculate the Laplace approximation to the marginal log-likelihood function at parameter value p, which (if trans is FALSE, which is the default) should match the order of paramNodes. For any non-scalar nodes in paramNodes, the order within the node is column-major (which can be seen for R objects using as.numeric). Return value is the scalar (approximate, marginal) log likelihood.

If trans is TRUE, then p is the vector of parameters on the transformed scale, if any, described above. In this case, the parameters on the original scale (as the model was written) will be determined by calling the method pInverseTransform(p). Note that the length of the parameter vector on the transformed scale might not be the same as on the original scale (because some constraints of non-scalar parameters result in fewer free transformed parameters than original parameters).
calcLaplace(p, trans). This is the same as calcLogLik.
findMLE(pStart, method, hessian). Find the maximum likelihood estimates of parameters using the Laplace-approximated marginal likelihood. Arguments include pStart: initial parameter values (defaults to parameter values currently in the model); method: (outer) optimization method to use in optim (defaults to "BFGS"); and hessian: whether to calculate and return the Hessian matrix (defaults to TRUE). Second derivatives in the Hessian are determined by finite differences of the gradients obtained by automatic differentiation (AD). Return value is a nimbleList of type optimResultNimbleList, similar to what is returned by R's optim. See help(nimOptim).
summary(MLEoutput, originalScale, randomEffectsStdError, jointCovariance). Summarize the maximum likelihood estimation results, given object MLEoutput that was returned by findMLE. The summary can include a covariance matrix for the parameters, the random effects, or both), and these can be returned on the original parameter scale or on the (potentially) transformed scale(s) used in estimation.

In more detail, summary accepts the following optional arguments:
- originalScale. Logical. If TRUE, the function returns results on the original scale(s) of parameters and random effects; otherwise, it returns results on the transformed scale(s). If there are no constraints, the two scales are identical. Defaults to TRUE.
- randomEffectsStdError. Logical. If TRUE, standard errors of random effects will be calculated. Defaults to FALSE.
- jointCovariance. Logical. If TRUE, the joint variance-covariance matrix of the parameters and the random effects will be returned. If FALSE, the variance-covariance matrix of the parameters will be returned. Defaults to FALSE.
The object returned by summary is a nimbleList with elements:
- params. A list that contains estimates and standard errors of parameters (on the original or transformed scale, as chosen by originalScale).
- randomEffects. A list that contains estimates of random effects and, if requested (randomEffectsStdError=TRUE) their standard errors, on original or transformed scale. Standard errors are calculated following the generalized delta method of Kass and Steffey (1989).
- vcov. If requested (i.e. jointCovariance=TRUE), the joint variance-covariance matrix of the parameters and random effects, on original or transformed scale. If jointCovariance=FALSE, the covariance matrix of the parameters, on original or transformed scale.
- scale. "original" or "transformed", the scale on which results were requested.

Additional methods to access or control more details of the Laplace approximation include:

getNodeNamesVec(returnParams). Return a vector (>1) of names of parameters/random effects nodes, according to returnParams = TRUE/FALSE. Use this if there is more than one node.
getNodeNameSingle(returnParams). Return the name of a single parameter/random effect node, according to returnParams = TRUE/FALSE. Use this if there is only one node.
setMethod(method). Set method ID for calculating the Laplace approximation and gradient: 1 (Laplace1), 2 (Laplace2, default method), or 3 (Laplace3). See below for more details. Users wanting to explore efficiency can try switching from method 2 (default) to methods 1 or 3 and comparing performance. The first Laplace approximation with each method will be (much) slower than subsequent Laplace approximations.
getMethod(). Return the current method ID for Laplace.
gr_logLik(p, trans). Gradient of the Laplace-approximated marginal log-likelihood at parameter value p. Argument trans is similar to that in calcLaplace. If there are multiple parameters, the vector p is given in the order of parameter names returned by getNodeNamesVec(returnParams=TRUE).
gr_Laplace(p, trans). This is the same as gr_logLik.
otherLogLik(p). Calculate the calcNodesOther nodes, which returns the log-likelihood of the parts of the model that are not included in the Laplace approximation.
gr_otherLogLik(p). Gradient (vector of derivatives with respect to each parameter) of otherLogLik(p). Results should match gr_otherLogLik_internal(p) but may be more efficient after the first call.

Finally, methods that are primarily for internal use by other methods include:

gr_logLik_pTransformed. Gradient of the Laplace approximation (calcLogLik_pTransformed(pTransform)) at transformed (unconstrained) parameter value pTransform.
pInverseTransform(pTransform). Back-transform the transformed parameter value pTransform to original scale.
derivs_pInverseTransform(pTransform, order). Derivatives of the back-transformation (i.e. inverse of parameter transformation) with respect to transformed parameters at pTransform. Derivative order is given by order (any of 0, 1, and/or 2).
reInverseTransform(reTrans). Back-transform the transformed random effects value reTrans to original scale.
derivs_reInverseTransform(reTrans, order). Derivatives of the back-transformation (i.e. inverse of random effects transformation) with respect to transformed random effects at reTrans. Derivative order is given by order (any of 0, 1, and/or 2).
optimRandomEffects(pTransform). Calculate the optimized random effects given transformed parameter value pTransform. The optimized random effects are the mode of the conditional distribution of random effects given data at parameters pTransform, i.e. the calculation of calcNodes.
inverse_negHess(p, reTransform). Calculate the inverse of the negative Hessian matrix of the joint (parameters and random effects) log-likelihood with respect to transformed random effects, evaluated at parameter value p and transformed random effects reTransform.
hess_logLik_wrt_p_wrt_re(p, reTransform). Calculate the Hessian matrix of the joint log-likelihood with respect to parameters and transformed random effects, evaluated at parameter value p and transformed random effects reTransform.
one_time_fixes(). Users never need to run this. Is is called when necessary internally to fix dimensionality issues if there is only one parameter in the model.
calcLogLik_pTransformed(pTransform). Laplace approximation at transformed (unconstrained) parameter value pTransform. To make maximizing the Laplace likelihood unconstrained, an automated transformation via parameterTransform is performed on any parameters with constraints indicated by their priors (even though the prior probabilities are not used).
gr_otherLogLik_internal(p). Gradient (vector of derivatives with respect to each parameter) of otherLogLik(p). This is obtained using automatic differentiation (AD) with single-taping. First call will always be slower than later calls.

<code>control</code> list

buildLaplace accepts the following control list elements:

split. If TRUE (default), randomEffectsNodes will be split into conditionally independent sets if possible. This facilitates more efficient Laplace approximation because each conditionally independent set can be marginalized independently. If FALSE, randomEffectsNodes will be handled as one multivariate block, with one multivariate Laplace approximation. If split is a numeric vector, randomEffectsNodes will be split by split(randomEffectsNodes, control$split). The last option allows arbitrary control over how randomEffectsNodes are blocked.
check. If TRUE (default), a warning is issued if paramNodes, randomEffectsNodes and/or calcNodes are provided but seek to have missing elements or unnecessary elements based on some default inspection of the model. If unnecessary warnings are emitted, simply set check=FALSE.
innerOptimControl. A list of control parameters for the inner optimization of Laplace approximation using optim. See 'Details' of optim for further information.
innerOptimMethod. Optimization method to be used in optim for the inner optimization. See 'Details' of optim. Currently optim in NIMBLE supports: "Nelder-Mead", "BFGS", "CG", and "L-BFGS-B". By default, method "CG" is used when marginalizing over a single (scalar) random effect, and "BFGS" is used for multiple random effects being jointly marginalized over.
innerOptimStart. Choice of starting values for the inner optimization. This could be "last", "last.best", or a vector of user provided values. "last" means the most recent random effects values left in the model will be used. When finding the MLE, the most recent values will be the result of the most recent inner optimization for Laplace. "last.best" means the random effects values corresponding to the largest Laplace likelihood (from any call to the calcLaplace or calcLogLik method, including during an MLE search) will be used (even if it was not the most recent Laplace likelihood). By default, the initial random effects values will be used for inner optimization.
outOptimControl. A list of control parameters for maximizing the Laplace log-likelihood using optim. See 'Details' of optim for further information.

Note that there are two numerical optimizations in the Laplace approximation algorithm: (1) maximizing the joint log-likelihood of random effects and data given a parameter value to construct the Laplace approximation to the marginal log-likelihood at the given parameter value; (2) maximizing the Laplace approximation to the marginal log-likelihood (i.e. calcLaplace) to find the MLEs of model parameters. In the control list above, the prefix 'inner' refers to optimization (1) and 'out' refers to optimization (2). Currently both optimizations use the optimizer optim. However, one can easily turn to other optimizers (say nlminb) in R for optimization (2); see the example below.

Author

Wei Zhang, Perry de Valpine

References

Kass, R. and Steffey, D. (1989). Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical Bayes models). Journal of the American Statistical Association, 84(407), 717-726.

Skaug, H. and Fournier, D. (2006). Automatic approximation of the marginal likelihood in non-Gaussian hierarchical models. Computational Statistics & Data Analysis, 56, 699-709.

Examples

Run this code

pumpCode <- nimbleCode({ 
  for (i in 1:N){
    theta[i] ~ dgamma(alpha, beta)
    lambda[i] <- theta[i] * t[i]
    x[i] ~ dpois(lambda[i])
  }
  alpha ~ dexp(1.0)
  beta ~ dgamma(0.1, 1.0)
})
pumpConsts <- list(N = 10, t = c(94.3, 15.7, 62.9, 126, 5.24, 31.4, 1.05, 1.05, 2.1, 10.5))
pumpData <- list(x = c(5, 1, 5, 14, 3, 19, 1, 1, 4, 22))
pumpInits <- list(alpha = 0.1, beta = 0.1, theta = rep(0.1, pumpConsts$N))
pump <- nimbleModel(code = pumpCode, name = "pump", constants = pumpConsts, 
                    data = pumpData, inits = pumpInits, buildDerivs = TRUE)
                    
# Build Laplace approximation
pumpLaplace <- buildLaplace(pump)

if (FALSE) {
# Compile the model
Cpump <- compileNimble(pump)
CpumpLaplace <- compileNimble(pumpLaplace, project = pump)
# Calculate MLEs of parameters
MLEres <- CpumpLaplace$findMLE()
# Calculate estimates and standard errors for parameters and random effects on original scale
allres <- CpumpLaplace$summary(MLEres, randomEffectsStdError = TRUE)

# Use nlminb to find MLEs
MLEres.nlminb <- nlminb(c(0.1, 0.1), 
                        function(x) -CpumpLaplace$calcLaplace(x),
                        function(x) -CpumpLaplace$gr_Laplace(x))
}

Run the code above in your browser using DataLab