PYdensity: MCMC for Pitman-Yor mixtures of Gaussians

Description

The PYdensity function generates a posterior density sample for a selection of univariate and multivariate Pitman-Yor process mixture models with Gaussian kernels. See details below for the description of the different specifications of the implemented models.

Usage

PYdensity(y, mcmc = list(), prior = list(), output = list())

Arguments

a vector or matrix giving the data based on which the density is to be estimated;

mcmc

a list of MCMC arguments:

niter (mandatory), number of iterations.
nburn (mandatory), number of iterations to discard as burn-in.
method, the MCMC sampling method to be used, options are 'ICS', 'MAR' and 'SLI' (default is 'ICS'). See details.
model, the type of model to be fitted (default is 'LS'). See details.
nupd, argument controlling the number of iterations to be displayed on screen: the function reports on standard output every time nupd new iterations have been carried out (default is niter/10).
print_message, control option. If equal to TRUE, the status is printed to standard output every nupd iterations (default is TRUE).
m_imp, number of generated values for the importance sampling step of method = 'ICS' (default is 10). See details.
slice_type, when method = 'SLI' it specifies the type of slice sampler. Options are 'DEP' for dependent slice-efficient, and 'INDEP' for independent slice-efficient (default is 'DEP'). See details.
hyper, if equal to TRUE, hyperprior distributions on the base measure's parameters are added, as specified in prior and explained in details (default is TRUE).

prior

a list giving the prior information. The list includes strength and discount, the strength and discount parameters of the Pitman-Yor process (default are strength = 1 and discount = 0, the latter leading to the Dirichlet process). The remaining parameters depend on the model choice.

If model = 'L' (location mixture) and y is univariate:
m0 and s20 are mean and variance of the base measure on the location parameter (default are 0 and 1); a0 and b0 are shape and scale parameters of the inverse gamma prior on the common scale parameter (default are 2 and 1). If hyper = TRUE, optional hyperpriors on the base measure's parameters are added: specifically, m1 and k1 are the mean parameter and the scale factor defining the normal hyperprior on m0 (default are 0 and 1), and a1 and b1 are shape and rate parameters of the inverse gamma hyperprior on b0 (default are 1 and 2). See details.
If model = 'LS' (location-scale mixture) and y is univariate:

m0 and k0 are the mean parameter and the scale factor defining the normal base measure on the location parameter (default are 0 and 1), and a0 and b0 are shape and scale parameters of the inverse gamma base measure on the scale parameter (default are 2 and 1). If hyper = TRUE, optional hyperpriors on the base measure's parameters are added: specifically, m1 and s21 are mean and variance parameters of the normal hyperprior on m0 (default are 0 and 1); tau1 and zeta1 are shape and rate parameters of the gamma hyperprior on k0 (default is 1 for both); a1 and b1 are shape and rate parameters of the gamma hyperprior on b0 (default is 1 for both). See details.
If model = 'L' (location mixture) and y is multivariate (p-variate):
m0 and S20 are mean and covariance of the base measure on the location parameter (default are a vector of zeroes and the identity matrix); Sigma0 and n0 are the parameters of the inverse Whishart prior on the common scale matrix. If hyper = TRUE, optional hyperpriors on the base measure's parameters are added: specifically, m1 and k1 are the mean parameter and the scale factor defining the normal hyperprior on m0 (default are a vector of zeroes and 1), and lambda and Lambda1 are the parameters (degrees of freedom and scale) of the inverse Wishart prior on S20 (default are p+2 and the identity matrix). See details.
If model = 'LS' (location-scale mixture) and y is multivariate (p-variate):
m0 and k0 are the mean parameter and the scale factor defining the normal base measure on the location parameter (default are a vector of zeros and 1), and n0 and Sigma0 are the parameters (degrees of freedom and scale) of the inverse Wishart base measure on the location parameter (default are p+2 and the identity matrix). If hyper = TRUE, optional hyperpriors on the base measure's parameters are added: specifically, m1 and S1 are mean and covariance matrix parameters of the normal hyperprior on m0 (default are a vector of zeroes and the identity matrix); tau1 and zeta1 are shape and rate parameters of the gamma hyperprior on k0 (default is 1 for both); n1 and Sigma1 are the parameters (degrees of freedom and scale) of the Wishart prior for Sigma0 (default are p+2 and the identity matrix divided p+2). See details.
If model = 'DLS' (diagonal location-scale mixture):
m0 and k0 are the mean vector parameter and the vector of scale factors defining the normal base measure on the location parameter (default are a vector of zeroes and a vector of ones), and a0 and b0 are vectors of shape and scale parameters defining the base measure on the scale parameters (default are a vector of twos and a vector of ones). If hyper = TRUE, optional hyperpriors on the base measure's parameters are added: specifically, m1 and s21 are vectors of mean and variance parameters for the normal hyperpriors on the components of m0 (default are a vector of zeros and a vector of ones); tau1 and zeta1 are vectors of shape and rate parameters of the gamma hyperpriors on the components of k0 (default is a vector of ones for both); a1 and b1 are vectors of shape and rate parameters of the gamma hyperpriors on the components of b0 (default is a vector of ones for both). See details.

output

a list of arguments for generating posterior output. It contains:

grid, a grid of points at which to evaluate the estimated posterior mean density; a data frame obtained with the expand.grid function.
out_param, if equal to TRUE, the function returns the draws of the kernel's parameters for each MCMC iteration, default is FALSE. See value for details.
out_type, if out_type = "FULL", the function returns the visited partitions and the realizations of the posterior density for each iterations. If out_type = "MEAN", the function returns the estimated partitions and the mean of the densities sampled at each iterations. If out_type = "CLUST", the function returns the estimated partition. Default "FULL".

Value

A BNPdens class object containing the estimated density and the cluster allocations for each iterations. If out_param = TRUE the output contains also the kernel specific parameters for each iteration. If mcmc_dens = TRUE the output contains also a realization from the posterior density for each iteration. IF mean_dens = TRUE the output contains just the mean of the realizations from the posterior density. The output contains also informations as the number of iterations, the number of burn-in iterations, the used computational time and the type of estimated model (univariate = TRUE or FALSE).

Details

This generic function fits a Pitman-Yor process mixture model for density estimation and clustering. The general model is $$\tilde f(y) = \int K(y; \theta) \tilde p (d \theta),$$ where $K(y; \theta)$ is a kernel density with parameter $\theta\in\Theta$. Univariate and multivariate Gaussian kernels are implemented with different specifications for the parametric space $\Theta$, as described below. The mixing measure $\tilde p$ has a Pitman-Yor process prior with strength parameter $\vartheta$, discount parameter $\alpha$, and base measure $P_0$ admitting the specifications presented below. For posterior sampling, three MCMC approaches are implemented. See details below.

Univariate data

For univariate $y$ the function implements both a location and location-scale mixture model. The former assumes $$\tilde f(y) = \int \phi(y; \mu, \sigma^2) \tilde p (d \mu) \pi(\sigma^2),$$ where $\phi(y; \mu, \sigma^2)$ is a univariate Gaussian kernel function with mean $\mu$ and variance $\sigma^2$, and $\pi(\sigma^2)$ is an inverse gamma prior. The base measure is specified as $$P_0(d \mu) = N(d \mu; m_0, \sigma^2_0),$$ and $\sigma^2 \sim IGa(a_0, b_0)$. Optional hyperpriors for the base measure's parameters are $$(m_0,\sigma^2_0) \sim N(m_1, \sigma^2_0 / k_1) \times IGa(a_1, b_1).$$

The location-scale mixture model, instead, assumes $$\tilde f(y) = \int \phi(y; \mu, \sigma^2) \tilde p (d \mu, d \sigma^2)$$ with normal-inverse gamma base measure $$P_0 (d \mu, d \sigma^2) = N(d \mu; m_0, \sigma^2 / k_0) \times IGa(d \sigma^2; a_0, b_0).$$ and (optional) hyperpriors $$m_0 \sim N(m_1, \sigma_1^2 ),\quad k_0 \sim Ga(\tau_1, \zeta_1),\quad b_0 \sim Ga(a_1, b_1).$$

Multivariate data

For multivariate $y$ ($p$-variate) the function implements a location mixture model (with full covariance matrix) and two different location-scale mixture models, with either full or diagonal covariance matrix. The location mixture model assumes $$\tilde f(y) = \int \phi_p(y; \mu, \Sigma) \tilde p (d \mu) \pi(\Sigma)$$ where $\phi_p(y; \mu, \Sigma)$ is a $p$-dimensional Gaussian kernel function with mean vector $\mu$ and covariance matrix $\Sigma$. The prior on $\Sigma$ is inverse Whishart with parameters $\Sigma_0$ and $\nu_0$, while the base measure is $$P_0(d \mu) = N(d \mu; m_0, S_0),$$ with optional hyperpriors $$m_0 \sim N(m_1, S_0 / k_1),\quad S_0 \sim IW(\lambda_1, \Lambda_1).$$

The location-scale mixture model assumes

$$\tilde f(x) = \int \phi_p(y; \mu, \Sigma) \tilde p (d \mu, d \Sigma).$$ Two possible structures for $\Sigma$ are implemented, namely full and diagonal covariance. For the full covariance mixture model, the base measure is the normal-inverse Wishart $$P_0 (d \mu, d \Sigma) = N(d \mu; m_0, \Sigma / k_0) \times IW(d \Sigma; \nu_0, \Sigma_0),$$ with optional hyperpriors $$m_0 \sim N(m_1, S_1),\quad k_0 \sim Ga(\tau_1, \zeta_1),\quad b_0 \sim W(\nu_1, \Sigma_1).$$ The second location-scale mixture model assumes a diagonal covariance structure. This is equivalent to write the mixture model as a mixture of products of univariate normal kernels, i.e. $$\tilde f(y) = \int \prod_{r=1}^p \phi(y_r; \mu_r, \sigma^2_r) \tilde p (d \mu_1,\ldots,d \mu_p, d \sigma_1^2,\ldots,d \sigma_p^2).$$ For this specification, the base measure is assumed defined as the product of $p$ independent normal-inverse gamma distributions, that is $$P_0 = \prod_{r=1}^p P_{0r}$$ where $$P_{0r}(d \mu_r,d \sigma_r^2) = N(d \mu_r; m_{0r}, \sigma^2_r / k_{0r}) \times Ga(d \sigma^2_r; a_{0r}, b_{0r}).$$ Optional hyperpriors can be added, and, for each component, correspond to the set of hyperpriors considered for the univariate location-scale mixture model.

Posterior simulation methods

This generic function implements three types of MCMC algorithms for posterior simulation. The default method is the importance conditional sampler 'ICS' (Canale et al. 2019). Other options are the marginal sampler 'MAR' (Neal, 2000) and the slice sampler 'SLI' (Kalli et al. 2011). The importance conditional sampler performs an importance sampling step when updating the values of individual parameters $\theta$, which requires to sample m_imp values from a suitable proposal. Large values of m_imp are known to improve the mixing of the chain at the cost of increased running time (Canale et al. 2019). Two options are available for the slice sampler, namely the dependent slice-efficient sampler (slice_type = 'DEP'), which is set as default, and the independent slice-efficient sampler (slice_type = 'INDEP') (Kalli et al. 2011).

References

Canale, A., Corradin, R., Nipoti, B. (2019), Importance conditional sampling for Bayesian nonparametric mixtures, arXiv preprint, arXiv:1906.08147

Kalli, M., Griffin, J. E., and Walker, S. G. (2011), Slice sampling mixture models. Statistics and Computing 21, 93-105.

Neal, R. M. (2000), Markov Chain Sampling Methods for Dirichlet Process Mixture Models, Journal of Computational and Graphical Statistics 9, 249-265.

Examples

Run this code

# NOT RUN {
data_toy <- cbind(c(rnorm(100, -3, 1), rnorm(100, 3, 1)),
                  c(rnorm(100, -3, 1), rnorm(100, 3, 1)))
grid <- expand.grid(seq(-7, 7, length.out = 50),
                    seq(-7, 7, length.out = 50))
est_model <- PYdensity(y = data_toy, mcmc = list(niter = 200, nburn = 100),
output = list(grid = grid))
summary(est_model)
plot(est_model)

# }

Run the code above in your browser using DataLab