The `IterativeQuadrature`

function iteratively approximates the
first two moments of marginal posterior distributions of a Bayesian
model with deterministic integration.

```
IterativeQuadrature(Model, parm, Data, Covar=NULL, Iterations=100,
Algorithm="CAGH", Specs=NULL, Samples=1000, sir=TRUE,
Stop.Tolerance=c(1e-5,1e-15), CPUs=1, Type="PSOCK")
```

Model

This required argument receives the model from a
user-defined function. The user-defined function is where the model
is specified. `IterativeQuadrature`

passes two arguments to
the model function, `parms`

and `Data`

. For more
information, see the `LaplacesDemon`

function and
``LaplacesDemon Tutorial'' vignette.

parm

This argument requires a vector of initial values equal in
length to the number of parameters. `IterativeQuadrature`

will
attempt to approximate these initial values for the parameters as
means (or posterior modes) of normal integrals. The
`GIV`

function may be used to randomly generate initial
values. Parameters must be continuous.

Data

This required argument accepts a list of data. The list of
data must include `mon.names`

which contains monitored variable
names, and `parm.names`

which contains parameter
names.

Covar

This argument accepts a \(J \times J\) covariance matrix for \(J\) initial values. When a covariance matrix is not supplied, a scaled identity matrix is used.

Iterations

This argument accepts an integer that determines the
number of iterations that `IterativeQuadrature`

will attempt
to approximate the posterior with normal
integrals. `Iterations`

defaults to 100.
`IterativeQuadrature`

will stop before this number of
iterations if the tolerance is less than or equal to the
`Stop.Tolerance`

criterion. The required amount of computer
memory increases with `Iterations`

. If computer memory is
exceeded, then all will be lost.

Algorithm

This optional argument accepts a quoted string that
specifies the iterative quadrature algorithm. The default
method is `Method="CAGH"`

. Options include `"AGHSG"`

for
Adaptive Gauss-Hermite Sparse Grid, and `"CAGH"`

for
Componentwise Adaptive Gaussian-Hermite.

Specs

This argument accepts a list of specifications for an algorithm.

Samples

This argument indicates the number of posterior samples
to be taken with sampling importance resampling via the
`SIR`

function, which occurs only when
`sir=TRUE`

. Note that the number of samples should increase
with the number and intercorrelations of the parameters.

sir

This logical argument indicates whether or not Sampling
Importance Resampling (SIR) is conducted via the `SIR`

function to draw independent posterior samples. This argument
defaults to `TRUE`

. Even when `TRUE`

, posterior samples
are drawn only when `IterativeQuadrature`

has
converged. Posterior samples are required for many other functions,
including `plot.iterquad`

and `predict.iterquad`

. Less
time can be spent on sampling by increasing `CPUs`

, if
available, which parallelizes the sampling.

Stop.Tolerance

This argument accepts a vector of two positive
numbers, and defaults to `1e-5,1e-15`

. Tolerance is calculated
each iteration, and the criteria varies by algorithm. The algorithm
is considered to have converged to the user-specified
`Stop.Tolerance`

when the tolerance is less than or equal to
the value of `Stop.Tolerance`

, and the algorithm terminates at
the end of the current iteration. Unless stated otherwise, the
first element is the stop tolerance for the change in \(\mu\),
the second element is the stop tolerance for the change in mean
integration error, and the first tolerance must be met before the
second tolerance is considered.

CPUs

This argument accepts an integer that specifies the number
of central processing units (CPUs) of the multicore computer or
computer cluster. This argument defaults to `CPUs=1`

, in which
parallel processing does not occur. When multiple CPUs are
specified, model function evaluations are parallelized across the
nodes, and sampling with `SIR`

is parallelized when
`sir=TRUE`

.

Type

This argument specifies the type of parallel processing to
perform, accepting either `Type="PSOCK"`

or
`Type="MPI"`

.

`IterativeQuadrature`

returns an object of class `iterquad`

that is a list with the following components:

This is the name of the iterative quadrature algorithm.

This is the matched call of `IterativeQuadrature`

.

This is a logical indicator of whether or not
`IterativeQuadrature`

converged within the specified
`Iterations`

according to the supplied `Stop.Tolerance`

criterion. Convergence does not indicate that the global maximum has
been found, but only that the tolerance was less than or equal to
the `Stop.Tolerance`

criteria.

This is the estimated covariance matrix. The `Covar`

matrix may be scaled and input into the `Covar`

argument of the
`LaplacesDemon`

or `PMC`

function for
further estimation. To scale this matrix for use with Laplace's
Demon or PMC, multiply it by \(2.38^2/d\), where \(d\) is the
number of initial values.

This is a vector of the iterative history of the
deviance in the `IterativeQuadrature`

function, as it sought
convergence.

This is a matrix of the iterative history of the
parameters in the `IterativeQuadrature`

function, as it sought
convergence.

This is the vector of initial values that was
originally given to `IterativeQuadrature`

in the `parm`

argument.

This is an approximation of the logarithm of the marginal
likelihood of the data (see the `LML`

function for more
information). When the model has converged and `sir=TRUE`

, the
NSIS method is used. When the model has converged and
`sir=FALSE`

, the LME method is used. This is the
logarithmic form of equation 4 in Lewis and Raftery (1997). As a
rough estimate of Kass and Raftery (1995), the LME-based LML is
worrisome when the sample size of the data is less than five times
the number of parameters, and `LML`

should be adequate in most
problems when the sample size of the data exceeds twenty times the
number of parameters (p. 778). The LME is inappropriate with
hierarchical models. However `LML`

is estimated, it is useful
for comparing multiple models with the `BayesFactor`

function.

This reports the final scalar value for the logarithm of the unnormalized joint posterior density.

This reports the initial scalar value for the logarithm of the unnormalized joint posterior density.

This is the latest matrix of the logarithm of the unnormalized joint posterior density. It is weighted and normalized so that each column sums to one.

This is the final \(N \times J\) matrix of quadrature weights that have been corrected for non-standard normal distributions, where \(N\) is the number of nodes and \(J\) is the number of parameters.

This is the number of minutes that
`IterativeQuadrature`

was running, and this includes the
initial checks as well as drawing posterior samples and creating
summaries.

When `sir=TRUE`

, a number of independent
posterior samples equal to `Samples`

is taken, and the draws
are stored here as a matrix. The rows of the matrix are the samples,
and the columns are the monitored variables.

This is the final number of nodes.

When `sir=TRUE`

, a number of independent
posterior samples equal to `Samples`

is taken, and the draws
are stored here as a matrix. The rows of the matrix are the samples,
and the columns are the parameters.

This is a summary matrix that summarizes the point-estimated posterior means. Uncertainty around the posterior means is estimated from the covariance matrix. Rows are parameters. The following columns are included: Mean, SD (Standard Deviation), LB (Lower Bound), and UB (Upper Bound). The bounds constitute a 95% probability interval.

This is a summary matrix that summarizes the
posterior samples drawn with sampling importance resampling
(`SIR`

) when `sir=TRUE`

, given the point-estimated
posterior modes and the covariance matrix. Rows are parameters. The
following columns are included: Mean, SD (Standard Deviation),
LB (Lower Bound), and UB (Upper Bound). The bounds constitute a 95%
probability interval.

This is the last `Tolerance`

of the
`LaplaceApproxiation`

algorithm.

This is the `Stop.Tolerance`

criteria.

This is the final \(N \times J\) matrix of the conditional mean, where \(N\) is the number of nodes and \(J\) is the number of parameters.

Quadrature is a historical term in mathematics that means determining area. Mathematicians of ancient Greece, according to the Pythagorean doctrine, understood determination of area of a figure as the process of geometrically constructing a square having the same area (squaring). Thus the name quadrature for this process.

In medieval Europe, quadrature meant the calculation of area by any method. With the invention of integral calculus, quadrature has been applied to the computation of a univariate definite integral. Numerical integration is a broad family of algorithms for calculating the numerical value of a definite integral. Numerical quadrature is a synonym for quadrature applied to one-dimensional integrals. Multivariate quadrature, also called cubature, is the application of quadrature to multidimensional integrals.

A quadrature rule is an approximation of the definite integral of a function, usually stated as a weighted sum of function values at specified points within the domain of integration. The specified points are referred to as abscissae, abscissas, integration points, or nodes, and have associated weights. The calculation of the nodes and weights of the quadrature rule differs by the type of quadrature. There are numerous types of quadrature algorithms. Bayesian forms of quadrature usually use Gauss-Hermite quadrature (Naylor and Smith, 1982), and placing a Gaussian Process on the function is a common extension (O'Hagan, 1991; Rasmussen and Ghahramani, 2003) that is called `Bayesian Quadrature'. Often, these and other forms of quadrature are also referred to as model-based integration.

Gauss-Hermite quadrature uses Hermite polynomials to calculate the
rule. However, there are two versions of Hermite polynomials, which
result in different kernels in different fields. In physics, the
kernel is `exp(-x^2)`

, while in probability the kernel is
`exp(-x^2/2)`

. The weights are a normal density. If the
parameters of the normal distribution, \(\mu\) and
\(\sigma^2\), are estimated from data, then it is referred
to as adaptive Gauss-Hermite quadrature, and the parameters are the
conditional mean and conditional variance. Outside of Gauss-Hermite
quadrature, adaptive quadrature implies that a difficult range in the
integrand is subdivided with more points until it is
well-approximated. Gauss-Hermite quadrature performs well when the
integrand is smooth, and assumes normality or multivariate normality.
Adaptive Gauss-Hermite quadrature has been demonstrated to outperform
Gauss-Hermite quadrature in speed and accuracy.

A goal in quadrature is to minimize integration error, which is the error between the evaluations and the weights of the rule. Therefore, a goal in Bayesian Gauss-Hermite quadrature is to minimize integration error while approximating a marginal posterior distribution that is assumed to be smooth and normally-distributed. This minimization often occurs by increasing the number of nodes until a change in mean integration error is below a tolerance, rather than minimizing integration error itself, since the target may be only approximately normally distributed, or minimizing the sum of integration error, which would change with the number of nodes.

To approximate integrals in multiple dimensions, one approach applies
\(N\) nodes of a univariate quadrature rule to multiple dimensions
(using the `GaussHermiteCubeRule`

function for example)
via the product rule, which results in many more multivariate nodes.
This requires the number of function evaluations to grow exponentially
as dimension increases. Multidimensional quadrature is usually limited
to less than ten dimensions, both due to the number of nodes required,
and because the accuracy of multidimensional quadrature algorithms
decreases as the dimension increases. Three methods may overcome this
curse of dimensionality in varying degrees: componentwise quadrature,
sparse grids, and Monte Carlo.

Componentwise quadrature is the iterative application of univariate quadrature to each parameter. It is applicable with high-dimensional models, but sacrifices the ability to calculate the conditional covariance matrix, and calculates only the variance of each parameter.

Sparse grids were originally developed by Smolyak for multidimensional quadrature. A sparse grid is based on a one-dimensional quadrature rule. Only a subset of the nodes from the product rule is included, and the weights are appropriately rescaled. Although a sparse grid is more efficient because it reduces the number of nodes to achieve the same accuracy, the user must contend with increasing the accuracy of the grid, and it remains inapplicable to high-dimensional integrals.

Monte Carlo is a large family of sampling-based algorithms. O'Hagan
(1987) asserts that Monte Carlo is frequentist, inefficient, regards
irrelevant information, and disregards relevant information.
Quadrature, he maintains (O'Hagan, 1992), is the most Bayesian
approach, and also the most efficient. In high dimensions, he
concedes, a popular subset of Monte Carlo algorithms is currently the
best for cheap model function evaluations. These algorithms are called
Markov chain Monte Carlo (MCMC). High-dimensional models with
expensive model evaluation functions, however, are not well-suited to
MCMC. A large number of MCMC algorithms is available in the
`LaplacesDemon`

function.

Following are some reasons to consider iterative quadrature rather than MCMC. Once an MCMC sampler finds equilibrium, it must then draw enough samples to represent all targets. Iterative quadrature does not need to continue drawing samples. Multivariate quadrature is consistently reported as more efficient than MCMC when its assumptions hold, though multivariate quadrature is limited to small dimensions. High-dimensional models therefore default to MCMC, between the two. Componentwise quadrature algorithms like CAGH, however, may also be more efficient with cloc-time than MCMC in high dimensions, especially against componentwise MCMC algorithms. Another reason to consider iterative quadrature are that assessing convergence in MCMC is a difficult topic, but not for iterative quadrature. A user of iterative quadrature does not have to contend with effective sample size and autocorrelation, assessing stationarity, acceptance rates, diminishing adaptation, etc. Stochastic sampling in MCMC is less efficient when samples occur in close proximity (such as when highly autocorrelated), whereas in quadrature the nodes are spread out by design.

In general, the conditional means and conditional variances progress smoothly to the target in multidimensional quadrature. For componentwise quadrature, movement to the target is not smooth, and often resembles a Markov chain or optimization algorithm.

Iterative quadrature is often applied after
`LaplaceApproximation`

to obtain a more reliable
estimate of parameter variance or covariance than the negative inverse
of the `Hessian`

matrix of second derivatives, which is
suitable only when the contours of the logarithm of the unnormalized
joint posterior density are approximately ellipsoidal (Naylor and
Smith, 1982, p. 224).

When `Algorithm="AGH"`

, the Naylor and Smith (1982) algorithm
is used. The AGH algorithm uses multivariate quadrature with the
physicist's (not the probabilist's) kernel.

There are four algorithm specifications: `N`

is the number of
univariate nodes, `Nmax`

is the maximum number of univariate
nodes, `Packages`

accepts any package required for the model
function when parallelized, and `Dyn.libs`

accepts dynamic
libraries for parallelization, if required. The number of univariate
nodes begins at \(N\) and increases by one each iteration. The
number of multivariate nodes grows quickly with \(N\). Naylor and
Smith (1982) recommend beginning with as few nodes as \(N=3\). Any
of the following events will cause \(N\) to increase by 1 when
\(N\) is less than `Nmax`

:

All LP weights are zero (and non-finite weights are set to zero)

\(\mu\) does not result in an increase in LP

All elements in \(\Sigma\) are not finite

The square root of the sum of the squared changes in \(\mu\) is less than or equal to the

`Stop.Tolerance`

Tolerance includes two metrics: change in mean integration error and change in parameters. Including the change in parameters for tolerance was not mentioned in Naylor and Smith (1982).

Naylor and Smith (1982) consider a transformation due to correlation. This is not included here.

The AGH algorithm does not currently handle constrained parameters,
such as with the `interval`

function. If a parameter is
constrained and changes during a model evaluation, this changes the
node and the multivariate weight. This is currently not corrected.

An advantage of AGH over componentwise adaptive quadrature is that AGH estimates covariance, where a componentwise algorithm ignores it. A disadvantage of AGH over a componentwise algorithm is that the number of nodes increases so quickly with dimension, that AGH is limited to small-dimensional models.

When `Algorithm="AGHSG"`

, the Naylor and Smith (1982) algorithm
is applied to a sparse grid, rather than a traditional multivariate
quadrature rule. This is identical to the AGH algorithm above, except
that a sparse grid replaces the multivariate quadrature rule.

The sparse grid reduces the number of nodes. The cost of reducing the number of nodes is that the user must consider the accuracy, \(K\).

There are four algorithm specifications: `K`

is the accuracy (as a
positive integer), `Kmax`

is the maximum accuracy,
`Packages`

accepts any package required for the model function
when parallelized, and `Dyn.libs`

accepts dynamic libraries for
parallelization, if required. These arguments represent accuracy
rather than the number of univariate nodes, but otherwise are similar
to the AGH algorithm.

When `Algorithm="CAGH"`

, a componentwise version of the adaptive
Gauss-Hermite quadrature of Naylor and Smith (1982) is used. Each
iteration, each marginal posterior distribution is approximated
sequentially, in a random order, with univariate quadrature. The
conditional mean and conditional variance are also approximated each
iteration, making it an adaptive algorithm.

There are four algorithm specifications: `N`

is the number of
nodes, `Nmax`

is the maximum number of nodes, `Packages`

accepts any package required for the model function when parallelized,
and `Dyn.libs`

accepts dynamic libraries for parallelization, if
required. The number of nodes begins at \(N\). All parameters have
the same number of nodes. Any of the following events will cause
\(N\) to increase by 1 when \(N\) is less than `Nmax`

, and
these conditions refer to all parameters (not individually):

Any LP weights are not finite

All LP weights are zero

\(\mu\) does not result in an increase in LP

The square root of the sum of the squared changes in \(\mu\) is less than or equal to the

`Stop.Tolerance`

It is recommended to begin with `N=3`

and set `Nmax`

between
10 and 100. As long as CAGH does not experience problematic weights,
and as long as CAGH is improving LP with \(\mu\), the number of nodes
does not increase. When CAGH becomes either universally problematic or
universally stable, then \(N\) slowly increases until the sum of
both the mean integration error and the sum of the squared changes in
\(\mu\) is less than the `Stop.Tolerance`

for two consecutive
iterations.

If the highest LP occurs at the lowest or highest node, then the value at that node becomes the conditional mean, rather than calculating it from all weighted samples; this facilitates movement when the current integral is poorly centered toward a well-centered integral. If all weights are zero, then a random proposal is generated with a small variance.

Tolerance includes two metrics: change in mean integration error and change in parameters, as the square root of the sum of the squared differences.

When a parameter constraint is encountered, the node and weight of the quadrature rule is recalculated.

An advantage of CAGH over multidimensional adaptive quadrature is that CAGH may be applied in large dimensions. Disadvantages of CAGH are that only variance, not covariance, is estimated, and ignoring covariance may be problematic.

Naylor, J.C. and Smith, A.F.M. (1982). "Applications of a Method for
the Efficient Computation of Posterior Distributions". *Applied
Statistics*, 31(3), p. 214--225.

O'Hagan, A. (1987). "Monte Carlo is Fundamentally Unsound". *The
Statistician*, 36, p. 247--249.

O'Hagan, A. (1991). "Bayes-Hermite Quadrature". *Journal of
Statistical Planning and Inference*, 29, p. 245--260.

O'Hagan, A. (1992). "Some Bayesian Numerical Analysis". In Bernardo,
J.M., Berger, J.O., David, A.P., and Smith, A.F.M., editors,
*Bayesian Statistics*, 4, p. 356--363, Oxford University Press.

Rasmussen, C.E. and Ghahramani, Z. (2003). "Bayesian Monte Carlo". In
Becker, S. and Obermayer, K., editors, *Advances in Neural
Information Processing Systems*, 15, MIT Press, Cambridge, MA.

`GaussHermiteCubeRule`

,
`GaussHermiteQuadRule`

,
`GIV`

,
`Hermite`

,
`Hessian`

,
`LaplaceApproximation`

,
`LaplacesDemon`

,
`LML`

,
`PMC`

,
`SIR`

, and
`SparseGrid`

.

```
# NOT RUN {
# The accompanying Examples vignette is a compendium of examples.
#################### Load the LaplacesDemon Library #####################
library(LaplacesDemon)
############################## Demon Data ###############################
data(demonsnacks)
y <- log(demonsnacks$Calories)
X <- cbind(1, as.matrix(log(demonsnacks[,10]+1)))
J <- ncol(X)
for (j in 2:J) X[,j] <- CenterScale(X[,j])
######################### Data List Preparation #########################
mon.names <- "mu[1]"
parm.names <- as.parm.names(list(beta=rep(0,J), sigma=0))
pos.beta <- grep("beta", parm.names)
pos.sigma <- grep("sigma", parm.names)
PGF <- function(Data) {
beta <- rnorm(Data$J)
sigma <- runif(1)
return(c(beta, sigma))
}
MyData <- list(J=J, PGF=PGF, X=X, mon.names=mon.names,
parm.names=parm.names, pos.beta=pos.beta, pos.sigma=pos.sigma, y=y)
########################## Model Specification ##########################
Model <- function(parm, Data)
{
### Parameters
beta <- parm[Data$pos.beta]
sigma <- interval(parm[Data$pos.sigma], 1e-100, Inf)
parm[Data$pos.sigma] <- sigma
### Log-Prior
beta.prior <- sum(dnormv(beta, 0, 1000, log=TRUE))
sigma.prior <- dhalfcauchy(sigma, 25, log=TRUE)
### Log-Likelihood
mu <- tcrossprod(Data$X, t(beta))
LL <- sum(dnorm(Data$y, mu, sigma, log=TRUE))
### Log-Posterior
LP <- LL + beta.prior + sigma.prior
Modelout <- list(LP=LP, Dev=-2*LL, Monitor=mu[1],
yhat=rnorm(length(mu), mu, sigma), parm=parm)
return(Modelout)
}
############################ Initial Values #############################
#Initial.Values <- GIV(Model, MyData, PGF=TRUE)
Initial.Values <- rep(0,J+1)
######################### Adaptive Gauss-Hermite ########################
#Fit <- IterativeQuadrature(Model, Initial.Values, MyData, Covar=NULL,
# Iterations=100, Algorithm="AGH",
# Specs=list(N=5, Nmax=7, Packages=NULL, Dyn.libs=NULL), CPUs=1)
################## Adaptive Gauss-Hermite Sparse Grid ###################
#Fit <- IterativeQuadrature(Model, Initial.Values, MyData, Covar=NULL,
# Iterations=100, Algorithm="AGHSG",
# Specs=list(K=5, Kmax=7, Packages=NULL, Dyn.libs=NULL), CPUs=1)
################# Componentwise Adaptive Gauss-Hermite ##################
#Fit <- IterativeQuadrature(Model, Initial.Values, MyData, Covar=NULL,
# Iterations=100, Algorithm="CAGH",
# Specs=list(N=3, Nmax=10, Packages=NULL, Dyn.libs=NULL), CPUs=1)
#Fit
#print(Fit)
#PosteriorChecks(Fit)
#caterpillar.plot(Fit, Parms="beta")
#plot(Fit, MyData, PDF=FALSE)
#Pred <- predict(Fit, Model, MyData, CPUs=1)
#summary(Pred, Discrep="Chi-Square")
#plot(Pred, Style="Covariates", Data=MyData)
#plot(Pred, Style="Density", Rows=1:9)
#plot(Pred, Style="Fitted")
#plot(Pred, Style="Jarque-Bera")
#plot(Pred, Style="Predictive Quantiles")
#plot(Pred, Style="Residual Density")
#plot(Pred, Style="Residuals")
#Levene.Test(Pred)
#Importance(Fit, Model, MyData, Discrep="Chi-Square")
#End
# }
```

Run the code above in your browser using DataLab