Function which performs the fitting of an adaptive mixture of Student-t distributions to approximate a target density through its kernel function
AdMit(KERNEL, mu0, Sigma0 = NULL, control = list(), ...)kernel function of the target density on which the adaptive mixture is fitted. This
    function should be vectorized for speed purposes (i.e., its first
    argument should be a matrix and its output a vector). Moreover, the function must contain
    the logical argument log. If log = TRUE, the function
    returns (natural) logarithm values of the kernel function. NA and
    NaN values are not allowed. (See *Details* for examples
    of KERNEL implementation.)
initial value in the first stage optimization (for the location of
    the first Student-t component) in the adaptive mixture, or
    location of the first Student-t component if Sigma0 is not NULL.
scale matrix of the first Student-t component (square, symmetric and positive definite). Default:
    Sigma0 = NULL, i.e., the scale matrix of the first Student-t
    component is estimated by the function AdMit.
control parameters (see *Details*).
further arguments to be passed to KERNEL.
A list with the following components:
CV: vector (of length \(H\)) of coefficients of variation of
  the importance sampling weights.
mit: list (of length 4) containing information on the fitted mixture of
  Student-t distributions, with the following components:
p: vector (of length \(H\)) of mixing probabilities.
  mu: matrix (of size \(H \times d\)) containing the
  vectors of modes (in row) of the mixture components.
  Sigma: matrix (of size \(H \times d^2\)) containing the scale
  matrices (in row) of the mixture components.
  df: degrees of freedom parameter of the Student-t components.
where \(H (\geq 1)\) is the number of components in the adaptive
  mixture of Student-t distributions and \(d (\geq 1)\) is
  the dimension of the first argument in KERNEL.
summary: data frame containing information on the optimization
  procedures. It returns for each component of the adaptive mixture of
  Student-t distribution: 1. the method used to estimate the mode
  and the scale matrix of the Student-t component (`USER' if Sigma0 is
  provided by the user; numerical optimization: `BFGS', `Nelder-Mead';
  importance sampling: `IS', with percentage(s) of importance weights
  used and scaling factor(s)); 2. the time required for this optimization;
  3. the method used to estimate the mixing probabilities
  (`NLMINB', `BFGS', `Nelder-Mead', `NONE'); 4. the time required for this
  optimization; 5. the coefficient of variation of the importance
  sampling weights.
  The argument KERNEL is the kernel function of the target
  density, and it should be vectorized for speed purposes.
As a first example, consider the kernel function proposed by Gelman-Meng (1991): $$ k(x_1,x_2) = \exp\left( -\frac{1}{2} [A x_1^2 x_2^2 + x_1^2 + x_2^2 - 2 B x_1 x_2 - 2 C_1 x_1 - 2 C_2 x_2] \right) $$ where commonly used values are \(A=1\), \(B=0\), \(C_1=3\) and \(C_2=3\).
A vectorized implementation of this function might be:
    GelmanMeng <- function(x, A = 1, B = 0, C1 = 3, C2 = 3, log = TRUE)
    {
      if (is.vector(x))
        x <- matrix(x, nrow = 1)
      r <- -.5 * (A * x[,1]^2 * x[,2]^2 + x[,1]^2 + x[,2]^2
                - 2 * B * x[,1] * x[,2] - 2 * C1 * x[,1] - 2 * C2 * x[,2])
      if (!log)
        r <- exp(r)
      as.vector(r)
    }
  This way, we may supply a point \((x_1,x_2)\)
  for x and the function will output a single value (i.e., the kernel
  estimated at this point). But the function is vectorized, in the sense
  that we may supply a \((N \times 2)\) matrix
  of values for x, where rows of x are
  points \((x_1,x_2)\) and the output will be a vector of
  length \(N\), containing the kernel values for these points.
  Since the AdMit procedure evaluates KERNEL for a
  large number of points, a vectorized implementation is important. Note
  also the additional argument log = TRUE which is used for
  numerical stability.
As a second example, consider the following (simple) econometric model: $$ y_t \sim \, i.i.d. \, N(\mu,\sigma^2) \quad t=1,\ldots,T $$ where \(\mu\) is the mean value and \(\sigma\) is the standard deviation. Our purpose is to estimate \(\theta = (\mu,\sigma)\) within a Bayesian framework, based on a vector \(y\) of \(T\) observations; the kernel thus consists of the product of the prior and the likelihood function. As previously mentioned, the kernel function should be vectorized, i.e., treat a \((N \times 2)\) matrix of points \(\theta\) for which the kernel will be evaluated. Using the common (Jeffreys) prior \(p(\theta) = \frac{1}{\sigma}\) for \(\sigma > 0\), a vectorized implementation of the kernel function might be:
     KERNEL <- function(theta, y, log = TRUE)
     {
       if (is.vector(theta))
         theta <- matrix(theta, nrow = 1)## sub function which returns the log-kernel for a given
       ## thetai value (i.e., a given row of theta)
       KERNEL_sub <- function(thetai)
       {
         if (thetai[2] > 0) ## check if sigma>0
	 { ## if yes, compute the log-kernel at thetai
           r <- - log(thetai[2])
	         + sum(dnorm(y, thetai[1], thetai[2], TRUE))
	 }
	 else
	 { ## if no, returns -Infinity
	   r <- -Inf
	 }
	 as.numeric(r)
       }
## 'apply' on the rows of theta (faster than a for loop)
       r <- apply(theta, 1, KERNEL_sub)
       if (!log)
         r <- exp(r)
       as.numeric(r)
     }
   
Since this kernel function also depends on the vector \(y\), it
   must be passed to KERNEL in the AdMit function. This is
   achieved via the argument \(\ldots\), i.e., AdMit(KERNEL, mu = c(0, 1), y = y).
To gain even more speed, implementation of KERNEL might rely on C or Fortran
   code using the functions .C and .Fortran. An example is
   provided in the file AdMitJSS.R in the package's folder.
The argument control is a list that can supply any of
  the following components:
Nsnumber of draws used in the evaluation of the
      importance sampling weights (integer number not smaller than 100). Default: Ns = 1e5.
Npnumber of draws used in the optimization of the mixing
      probabilities (integer number not smaller than 100 and not larger
      than Ns). Default: Np = 1e3.
Hmaxmaximum number of Student-t components in the
      adaptive mixture (integer number not smaller than one). Default: Hmax = 10.
dfdegrees of freedom parameter of the
      Student-t components (real number not smaller than one). Default: df = 1.
CVtoltolerance for the relative change of the coefficient of
      variation (real number in [0,1]). The
      adaptive algorithm stops if the new
      component leads to a relative change in the coefficient of
      variation that is smaller or equal than
      CVtol. Default: CVtol = 0.1, i.e., 10%.
weightNCweight assigned to the new
      Student-t component of the adaptive mixture as
      a starting value in the optimization of the mixing probabilities
      (real number in [0,1]). Default: weightNC = 0.1, i.e., 10%.
tracetracing information on
      the adaptive fitting procedure (logical). Default:
      trace = FALSE, i.e., no tracing information.
ISshould importance sampling be used to estimate the
      mode and the scale matrix of the Student-t components (logical). Default: IS = FALSE,
      i.e., use numerical optimization instead.
ISpercentvector of percentage(s) of largest weights used to
      estimate the mode and the scale matrix of the Student-t
      components of the adaptive mixture by importance
      sampling (real number(s) in [0,1]). Default:
      ISpercent = c(0.05, 0.15, 0.3), i.e., 5%, 15% and 30%.
ISscalevector of scaling factor(s) used to rescale the
      scale matrix of the mixture components (real positive numbers).
      Default: ISscale = c(1, 0.25, 4).
trace.muTracing information on
      the progress in the optimization of the mode of the mixture
      components (non-negative integer number). Higher values
      may produce more tracing information (see the source code
      of the function optim for further details).
      Default: trace.mu = 0, i.e., no tracing information.
maxit.mumaximum number of iterations used
      in the optimization of the modes of the mixture components
      (positive integer). Default: maxit.mu = 500.
reltol.murelative convergence tolerance
      used in the optimization of the modes of the mixture components
      (real number in [0,1]). Default: reltol.mu = 1e-8.
trace.p, maxit.p, reltol.pthe same as for the arguments above, but for the optimization of the mixing probabilities of the mixture components.
Ardia, D., Hoogerheide, L.F., van Dijk, H.K. (2009a). AdMit: Adaptive Mixture of Student-t Distributions. R Journal 1(1), pp.25-30. 10.32614/RJ-2009-003
Ardia, D., Hoogerheide, L.F., van Dijk, H.K. (2009b). Adaptive Mixture of Student-t Distributions as a Flexible Candidate Distribution for Efficient Simulation: The R Package AdMit. Journal of Statistical Software 29(3), pp.1-32. 10.18637/jss.v029.i03
Gelman, A., Meng, X.-L. (1991). A Note on Bivariate Distributions That Are Conditionally Normal. The American Statistician 45(2), pp.125-126.
Hoogerheide, L.F. (2006). Essays on Neural Network Sampling Methods and Instrumental Variables. PhD thesis, Tinbergen Institute, Erasmus University Rotterdam (NL). ISBN: 9051708261. (Book nr. 379 of the Tinbergen Institute Research Series.)
Hoogerheide, L.F., Kaashoek, J.F., van Dijk, H.K. (2007). On the Shape of Posterior Densities and Credible Sets in Instrumental Variable Regression Models with Reduced Rank: An Application of Flexible Sampling Methods using Neural Networks. Journal of Econometrics 139(1), pp.154-180.
Hoogerheide, L.F., van Dijk, H.K. (2008). Possibly Ill-Behaved Posteriors in Econometric Models: On the Connection between Model Structures, Non-elliptical Credible Sets and Neural Network Simulation Techniques. Tinbergen Institute discussion paper 2008-036/4.
  AdMitIS for importance sampling using an
  adaptive mixture of Student-t distributions as the importance density,
  AdMitMH for the independence chain Metropolis-Hastings
  algorithm using an adaptive mixture of Student-t distributions as
  the candidate density.
# NOT RUN {
<!-- % -->
# }
# NOT RUN {
  ## NB : Low number of draws for speedup. Consider using more draws!
  ## Gelman and Meng (1991) kernel function
  GelmanMeng <- function(x, A = 1, B = 0, C1 = 3, C2 = 3, log = TRUE)
  {
    if (is.vector(x))
      x <- matrix(x, nrow = 1)
    r <- -.5 * (A * x[,1]^2 * x[,2]^2 + x[,1]^2 + x[,2]^2
              - 2 * B * x[,1] * x[,2] - 2 * C1 * x[,1] - 2 * C2 * x[,2])
    if (!log)
      r <- exp(r)
    as.vector(r)
  }
  ## Run AdMit (with default values)
  set.seed(1234)
  outAdMit <- AdMit(GelmanMeng, mu0 = c(0.0, 0.1), control = list(Ns = 1e4))
  print(outAdMit)
  ## Run AdMit (using importance sampling to estimate
  ## the modes and the scale matrices)
  set.seed(1234)
  outAdMit <- AdMit(KERNEL = GelmanMeng, 
                    mu0 = c(0.0, 0.1), 
                    control = list(IS = TRUE, Ns = 1e4))
  print(outAdMit)
# }
Run the code above in your browser using DataLab