piv_MCMC: JAGS/Stan Sampling for Gaussian Mixture Models and Clustering via Co-Association Matrix.

Description

Perform MCMC JAGS sampling or HMC Stan sampling for Gaussian mixture models, post-process the chains and apply a clustering technique to the MCMC sample. Pivotal units for each group are selected among four alternative criteria.

Usage

piv_MCMC(
  y,
  k,
  nMC,
  priors,
  piv.criterion = c("MUS", "maxsumint", "minsumnoint", "maxsumdiff"),
  clustering = c("diana", "hclust"),
  software = c("rjags", "rstan"),
  burn = 0.5 * nMC,
  chains = 4,
  cores = 1,
  sparsity = FALSE
)

Value

The function gives the MCMC output, the clustering solutions and the pivotal indexes. Here there is a complete list of outputs.

true.iter: The number of MCMC iterations for which the number of JAGS/Stan groups exactly coincides with the prespecified number of groups k.
Mu: An estimate of the groups' means.
groupPost: $true.iter \times N$ matrix with values from 1:k indicating the post-processed group allocation vector.
mcmc_mean: If y is a vector, a $true.iter \times k$ matrix with the post-processed MCMC chains for the mean parameters; if y is a matrix, a $true.iter \times D \times k$ array with the post-processed MCMC chains for the mean parameters.
mcmc_sd: If y is a vector, a $true.iter \times k$ matrix with the post-processed MCMC chains for the sd parameters; if y is a matrix, a $true.iter \times D$ array with the post-processed MCMC chains for the sd parameters.
mcmc_weight: A $true.iter \times k$ matrix with the post-processed MCMC chains for the weights parameters.
mcmc_mean_raw: If y is a vector, a $(nMC-burn) \times k$ matrix with the raw MCMC chains for the mean parameters as given by JAGS; if y is a matrix, a $(nMC-burn) \times D \times k$ array with the raw MCMC chains for the mean parameters as given by JAGS/Stan.
mcmc_sd_raw: If y is a vector, a $(nMC-burn) \times k$ matrix with the raw MCMC chains for the sd parameters as given by JAGS/Stan; if y is a matrix, a $(nMC-burn) \times D$ array with the raw MCMC chains for the sd parameters as given by JAGS/Stan.
mcmc_weight_raw: A $(nMC-burn) \times k$ matrix with the raw MCMC chains for the weights parameters as given by JAGS/Stan.
C: The $N \times N$ co-association matrix constructed from the MCMC sample.
grr: The vector of cluster membership returned by "diana" or "hclust".
pivots: The vector of indices of pivotal units identified by the selected pivotal criterion.
model: The JAGS/Stan model code. Apply the "cat" function for a nice visualization of the code.
stanfit: An object of S4 class stanfit for the fitted model (only if software="rstan").

Arguments

y: $N$-dimensional vector for univariate data or $N \times D$ matrix for multivariate data.
k: Number of mixture components.
nMC: Number of MCMC iterations for the JAGS/Stan function execution.
priors: Input prior hyperparameters (see Details for default options).
piv.criterion: The pivotal criterion used for identifying one pivot for each group. Possible choices are: "MUS", "maxsumint", "minsumnoint", "maxsumdiff". The default method is "maxsumint" (see the Details and the vignette).
clustering: The algorithm adopted for partitioning the $N$ observations into k groups. Possible choices are "diana" (default) or "hclust" for divisive and agglomerative hierarchical clustering, respectively.
software: The selected MCMC method to fit the model: "rjags" for the JAGS method, "rstan" for the Stan method. Default is "rjags".
burn: The burn-in period (only if method "rjags" is selected). Default is 0.5$\times$ nMC.
chains: A positive integer specifying the number of Markov chains. The default is 4.
cores: The number of cores to use when executing the Markov chains in parallel (only if software="rstan"). Default is 1.
sparsity: Allows for sparse finite mixtures, default is FALSE.

Author

Leonardo Egidi legidi@units.it

Details

The function fits univariate and multivariate Bayesian Gaussian mixture models of the form (here for univariate only): $$(Y_i|Z_i=j) \sim \mathcal{N}(\mu_j,\sigma_j),$$ where the $Z_i$, $i=1,\ldots,N$, are i.i.d. random variables, $j=1,\dots,k$, $\sigma_j$ is the group variance, $Z_i \in {1,\ldots,k }$ are the latent group allocation, and $$P(Z_i=j)=\eta_j.$$ The likelihood of the model is then $$L(y;\mu,\eta,\sigma) = \prod_{i=1}^N \sum_{j=1}^k \eta_j \mathcal{N}(\mu_j,\sigma_j),$$ where $(\mu, \sigma)=(\mu_{1},\dots,\mu_{k},\sigma_{1},\ldots,\sigma_{k})$ are the component-specific parameters and $\eta=(\eta_{1},\dots,\eta_{k})$ the mixture weights. Let $\nu$ denote a permutation of ${ 1,\ldots,k }$, and let $\nu(\mu)= (\mu_{\nu(1)},\ldots,$ $ \mu_{\nu(k)})$, $\nu(\sigma)= (\sigma_{\nu(1)},\ldots,$ $ \sigma_{\nu(k)})$, $ \nu(\eta)=(\eta_{\nu(1)},\ldots,\eta_{\nu(k)})$ be the corresponding permutations of $\mu$, $\sigma$ and $\eta$. Denote by $V$ the set of all the permutations of the indexes ${1,\ldots,k }$, the likelihood above is invariant under any permutation $\nu \in V$, that is $$ L(y;\mu,\eta,\sigma) = L(y;\nu(\mu),\nu(\eta),\nu(\sigma)).$$ As a consequence, the model is unidentified with respect to an arbitrary permutation of the labels. When Bayesian inference for the model is performed, if the prior distribution $p_0(\mu,\eta,\sigma)$ is invariant under a permutation of the indices, then so is the posterior. That is, if $p_0(\mu,\eta,\sigma) = p_0(\nu(\mu),\nu(\eta),\sigma)$, then $$ p(\mu,\eta,\sigma| y) \propto p_0(\mu,\eta,\sigma)L(y;\mu,\eta,\sigma)$$ is multimodal with (at least) $k!$ modes.

Depending on the selected software, the model parametrization changes in terms of the prior choices. Precisely, the JAGS philosophy with the underlying Gibbs sampling is to use noninformative priors, and conjugate priors are preferred for computational speed. Conversely, Stan adopts weakly informative priors, with no need to explicitly use the conjugacy. For univariate mixtures, when software="rjags" the specification is the same as the function BMMmodel of the bayesmix package:

$$\mu_j \sim \mathcal{N}(\mu_0, 1/B0inv)$$ $$\sigma_j \sim \mbox{invGamma}(nu0Half, nu0S0Half)$$ $$\eta \sim \mbox{Dirichlet}(1,\ldots,1)$$ $$S0 \sim \mbox{Gamma}(g0Half, g0G0Half),$$

with default values: $\mu_0=0, B0inv=0.1, nu0Half =10, S0=2, nu0S0Half= nu0Half\times S0, g0Half = 5e-17, g0G0Half = 5e-33$, in accordance with the default specification:

priors=list(kind = "independence", parameter = "priorsFish", hierarchical = "tau")

(see bayesmix for further details and choices).

When software="rstan", the prior specification is:

$$\mu_j \sim \mathcal{N}(\mu_0, 1/B0inv)$$ $$\sigma_j \sim \mbox{Lognormal}(\mu_{\sigma}, \tau_{\sigma})$$ $$\eta_j \sim \mbox{Uniform}(0,1),$$

with default values: $\mu_0=0, B0inv=0.1, \mu_{\sigma}=0, \tau_{\sigma}=2$. The users may specify new hyperparameter values with the argument:

priors=list(mu_0=1, B0inv=0.2, mu_sigma=3, tau_sigma=5)

For multivariate mixtures, when software="rjags" the prior specification is the following:

$$ \bm{\mu}_j \sim \mathcal{N}_D(\bm{\mu}_0, S2)$$ $$ \Sigma^{-1} \sim \mbox{Wishart}(S3, D+1)$$ $$\eta \sim \mbox{Dirichlet}(\bm{\alpha}),$$

where $\bm{\alpha}$ is a $k$-dimensional vector and $S_2$ and $S_3$ are positive definite matrices. By default, $\bm{\mu}_0=\bm{0}$, $\bm{\alpha}=(1,\ldots,1)$ and $S_2$ and $S_3$ are diagonal matrices, with diagonal elements equal to 1e+05. The user may specify other values for the hyperparameters $\bm{\mu}_0, S_2, S_3$ and $\bm{\alpha}$ via priors argument in such a way:

priors =list(mu_0 = c(1,1), S2 = ..., S3 = ..., alpha = ...)

with the constraint for $S2$ and $S3$ to be positive definite, and $\bm{\alpha}$ a vector of dimension $k$ with nonnegative elements.

When software="rstan", the prior specification is:

$$ \bm{\mu}_j \sim \mathcal{N}_D(\bm{\mu}_0, LD*L^{T})$$ $$L \sim \mbox{LKJ}(\epsilon)$$ $$D^*_j \sim \mbox{HalfCauchy}(0, \sigma_d).$$

The covariance matrix is expressed in terms of the LDL decomposition as $LD*L^{T}$, a variant of the classical Cholesky decomposition, where $L$ is a $D \times D$ lower unit triangular matrix and $D*$ is a $D \times D$ diagonal matrix. The Cholesky correlation factor $L$ is assigned a LKJ prior with $\epsilon$ degrees of freedom, which, combined with priors on the standard deviations of each component, induces a prior on the covariance matrix; as $\epsilon \rightarrow \infty$ the magnitude of correlations between components decreases, whereas $\epsilon=1$ leads to a uniform prior distribution for $L$. By default, the hyperparameters are $\bm{\mu}_0=\bm{0}$, $\sigma_d=2.5, \epsilon=1$. The user may propose some different values with the argument:

priors=list(mu_0=c(1,2), sigma_d = 4, epsilon =2)

If software="rjags" the function performs JAGS sampling using the bayesmix package for univariate Gaussian mixtures, and the runjags package for multivariate Gaussian mixtures. If software="rstan" the function performs Hamiltonian Monte Carlo (HMC) sampling via the rstan package (see the vignette and the Stan project for any help).

After MCMC sampling, this function clusters the units in k groups, calls the piv_sel() function and yields the pivots obtained from one among four different methods (the user may specify one among them via piv.criterion argument): "maxsumint", "minsumnoint", "maxsumdiff" and "MUS" (available only if k <= 4) (see the vignette for thorough details). Due to computational reasons clarified in the Details section of the function piv_rel, the length of the MCMC chains will be minor or equal than the input argument nMC; this length, corresponding to the value true.iter returned by the procedure, is the number of MCMC iterations for which the number of JAGS/Stan groups exactly coincides with the prespecified number of groups k.

References

Egidi, L., Pappadà, R., Pauli, F. and Torelli, N. (2018). Relabelling in Bayesian Mixture Models by Pivotal Units. Statistics and Computing, 28(4), 957-969.

Examples

Run this code


### Bivariate simulation

if (FALSE) {
N   <- 200
k   <- 4
D   <- 2
nMC <- 1000
M1  <- c(-.5,8)
M2  <- c(25.5,.1)
M3  <- c(49.5,8)
M4  <- c(63.0,.1)
Mu  <- rbind(M1,M2,M3,M4)
Sigma.p1 <- diag(D)
Sigma.p2 <- 20*diag(D)
W <- c(0.2,0.8)
sim <- piv_sim(N = N, k = k, Mu = Mu,
               Sigma.p1 = Sigma.p1,
               Sigma.p2 = Sigma.p2, W = W)

## rjags (default)
res <- piv_MCMC(y = sim$y, k =k, nMC = nMC)

## rstan
res_stan <- piv_MCMC(y = sim$y, k =k, nMC = nMC,
                     software ="rstan")

# changing priors
res2 <- piv_MCMC(y = sim$y,
                 priors = list (
                 mu_0=c(1,1),
                 S2 = matrix(c(0.002,0,0, 0.1),2,2, byrow=TRUE),
                 S3 = matrix(c(0.1,0,0,0.1), 2,2, byrow =TRUE)),
                 k = k, nMC = nMC)
}


### Fishery data (bayesmix package)

if (FALSE) {
library(bayesmix)
data(fish)
y <- fish[,1]
k <- 5
nMC <- 5000
res <- piv_MCMC(y = y, k = k, nMC = nMC)

# changing priors
res2   <- piv_MCMC(y = y,
                   priors = list(kind = "condconjugate",
                   parameter = "priorsRaftery",
                   hierarchical = "tau"),  k =k, nMC = nMC)
}

Run the code above in your browser using DataLab