Perform MCMC JAGS sampling or HMC Stan sampling for Gaussian mixture models, post-process the chains and apply a clustering technique to the MCMC sample. Pivotal units for each group are selected among four alternative criteria.
piv_MCMC(
y,
k,
nMC,
priors,
piv.criterion = c("MUS", "maxsumint", "minsumnoint", "maxsumdiff"),
clustering = c("diana", "hclust"),
software = c("rjags", "rstan"),
burn = 0.5 * nMC,
chains = 4,
cores = 1,
sparsity = FALSE
)
The function gives the MCMC output, the clustering solutions and the pivotal indexes. Here there is a complete list of outputs.
true.iter
The number of MCMC iterations for which
the number of JAGS/Stan groups exactly coincides with the prespecified
number of groups k
.
Mu
An estimate of the groups' means.
groupPost
\(true.iter \times N\) matrix
with values from 1:k
indicating the post-processed group allocation
vector.
mcmc_mean
If y
is a vector, a \(true.iter \times k\)
matrix with the post-processed MCMC chains for the mean parameters; if
y
is a matrix, a \(true.iter \times D \times k\) array with
the post-processed MCMC chains for the mean parameters.
mcmc_sd
If y
is a vector, a \(true.iter \times k\)
matrix with the post-processed MCMC chains for the sd parameters; if
y
is a matrix, a \(true.iter \times D\) array with
the post-processed MCMC chains for the sd parameters.
mcmc_weight
A \(true.iter \times k\) matrix with the post-processed MCMC chains for the weights parameters.
mcmc_mean_raw
If y
is a vector, a \((nMC-burn) \times k\) matrix
with the raw MCMC chains for the mean parameters as given by JAGS; if
y
is a matrix, a \((nMC-burn) \times D \times k\) array with the raw MCMC chains
for the mean parameters as given by JAGS/Stan.
mcmc_sd_raw
If y
is a vector, a \((nMC-burn) \times k\) matrix
with the raw MCMC chains for the sd parameters as given by JAGS/Stan; if
y
is a matrix, a \((nMC-burn) \times D\) array with the raw MCMC chains
for the sd parameters as given by JAGS/Stan.
mcmc_weight_raw
A \((nMC-burn) \times k\) matrix with the raw MCMC chains for the weights parameters as given by JAGS/Stan.
C
The \(N \times N\) co-association matrix constructed from the MCMC sample.
grr
The vector of cluster membership returned by
"diana"
or "hclust"
.
pivots
The vector of indices of pivotal units identified by the selected pivotal criterion.
model
The JAGS/Stan model code. Apply the "cat"
function for a nice visualization of the code.
stanfit
An object of S4 class stanfit
for the fitted model (only if software="rstan"
).
\(N\)-dimensional vector for univariate data or \(N \times D\) matrix for multivariate data.
Number of mixture components.
Number of MCMC iterations for the JAGS/Stan function execution.
Input prior hyperparameters (see Details for default options).
The pivotal criterion used for identifying one pivot
for each group. Possible choices are: "MUS", "maxsumint", "minsumnoint",
"maxsumdiff"
.
The default method is "maxsumint"
(see the Details and
the vignette).
The algorithm adopted for partitioning the
\(N\) observations into k
groups. Possible choices are "diana"
(default) or
"hclust"
for divisive and agglomerative hierarchical clustering, respectively.
The selected MCMC method to fit the model: "rjags"
for the JAGS method, "rstan"
for the Stan method.
Default is "rjags"
.
The burn-in period (only if method "rjags"
is selected). Default is 0.5
\(\times\) nMC
.
A positive integer specifying the number of Markov chains. The default is 4.
The number of cores to use when executing the Markov chains in parallel (only if
software="rstan"
). Default is 1.
Allows for sparse finite mixtures, default is FALSE
.
Leonardo Egidi legidi@units.it
The function fits univariate and multivariate Bayesian Gaussian mixture models of the form (here for univariate only): $$(Y_i|Z_i=j) \sim \mathcal{N}(\mu_j,\sigma_j),$$ where the \(Z_i\), \(i=1,\ldots,N\), are i.i.d. random variables, \(j=1,\dots,k\), \(\sigma_j\) is the group variance, \(Z_i \in {1,\ldots,k }\) are the latent group allocation, and $$P(Z_i=j)=\eta_j.$$ The likelihood of the model is then $$L(y;\mu,\eta,\sigma) = \prod_{i=1}^N \sum_{j=1}^k \eta_j \mathcal{N}(\mu_j,\sigma_j),$$ where \((\mu, \sigma)=(\mu_{1},\dots,\mu_{k},\sigma_{1},\ldots,\sigma_{k})\) are the component-specific parameters and \(\eta=(\eta_{1},\dots,\eta_{k})\) the mixture weights. Let \(\nu\) denote a permutation of \({ 1,\ldots,k }\), and let \(\nu(\mu)= (\mu_{\nu(1)},\ldots,\) \( \mu_{\nu(k)})\), \(\nu(\sigma)= (\sigma_{\nu(1)},\ldots,\) \( \sigma_{\nu(k)})\), \( \nu(\eta)=(\eta_{\nu(1)},\ldots,\eta_{\nu(k)})\) be the corresponding permutations of \(\mu\), \(\sigma\) and \(\eta\). Denote by \(V\) the set of all the permutations of the indexes \({1,\ldots,k }\), the likelihood above is invariant under any permutation \(\nu \in V\), that is $$ L(y;\mu,\eta,\sigma) = L(y;\nu(\mu),\nu(\eta),\nu(\sigma)).$$ As a consequence, the model is unidentified with respect to an arbitrary permutation of the labels. When Bayesian inference for the model is performed, if the prior distribution \(p_0(\mu,\eta,\sigma)\) is invariant under a permutation of the indices, then so is the posterior. That is, if \(p_0(\mu,\eta,\sigma) = p_0(\nu(\mu),\nu(\eta),\sigma)\), then $$ p(\mu,\eta,\sigma| y) \propto p_0(\mu,\eta,\sigma)L(y;\mu,\eta,\sigma)$$ is multimodal with (at least) \(k!\) modes.
Depending on the selected software, the model parametrization
changes in terms of the prior choices.
Precisely, the JAGS philosophy with the underlying Gibbs sampling
is to use noninformative priors, and conjugate priors are
preferred for computational speed.
Conversely, Stan adopts weakly informative priors,
with no need to explicitly use the conjugacy.
For univariate mixtures, when
software="rjags"
the specification is the same as the function BMMmodel
of the
bayesmix
package:
$$\mu_j \sim \mathcal{N}(\mu_0, 1/B0inv)$$ $$\sigma_j \sim \mbox{invGamma}(nu0Half, nu0S0Half)$$ $$\eta \sim \mbox{Dirichlet}(1,\ldots,1)$$ $$S0 \sim \mbox{Gamma}(g0Half, g0G0Half),$$
with default values: \(\mu_0=0, B0inv=0.1, nu0Half =10, S0=2, nu0S0Half= nu0Half\times S0, g0Half = 5e-17, g0G0Half = 5e-33\), in accordance with the default specification:
priors=list(kind = "independence", parameter = "priorsFish",
hierarchical = "tau")
(see bayesmix
for further details and choices).
When software="rstan"
, the prior specification is:
$$\mu_j \sim \mathcal{N}(\mu_0, 1/B0inv)$$ $$\sigma_j \sim \mbox{Lognormal}(\mu_{\sigma}, \tau_{\sigma})$$ $$\eta_j \sim \mbox{Uniform}(0,1),$$
with default values: \(\mu_0=0, B0inv=0.1, \mu_{\sigma}=0, \tau_{\sigma}=2\). The users may specify new hyperparameter values with the argument:
priors=list(mu_0=1, B0inv=0.2, mu_sigma=3, tau_sigma=5)
For multivariate mixtures, when software="rjags"
the prior specification is the following:
$$ \bm{\mu}_j \sim \mathcal{N}_D(\bm{\mu}_0, S2)$$ $$ \Sigma^{-1} \sim \mbox{Wishart}(S3, D+1)$$ $$\eta \sim \mbox{Dirichlet}(\bm{\alpha}),$$
where \(\bm{\alpha}\) is a \(k\)-dimensional vector
and \(S_2\) and \(S_3\)
are positive definite matrices. By default, \(\bm{\mu}_0=\bm{0}\),
\(\bm{\alpha}=(1,\ldots,1)\) and \(S_2\) and \(S_3\) are diagonal matrices,
with diagonal elements
equal to 1e+05. The user may specify other values for the hyperparameters
\(\bm{\mu}_0, S_2, S_3\) and \(\bm{\alpha}\) via priors
argument in such a way:
priors =list(mu_0 = c(1,1), S2 = ..., S3 = ..., alpha = ...)
with the constraint for \(S2\) and \(S3\) to be positive definite, and \(\bm{\alpha}\) a vector of dimension \(k\) with nonnegative elements.
When software="rstan"
, the prior specification is:
$$ \bm{\mu}_j \sim \mathcal{N}_D(\bm{\mu}_0, LD*L^{T})$$ $$L \sim \mbox{LKJ}(\epsilon)$$ $$D^*_j \sim \mbox{HalfCauchy}(0, \sigma_d).$$
The covariance matrix is expressed in terms of the LDL decomposition as \(LD*L^{T}\), a variant of the classical Cholesky decomposition, where \(L\) is a \(D \times D\) lower unit triangular matrix and \(D*\) is a \(D \times D\) diagonal matrix. The Cholesky correlation factor \(L\) is assigned a LKJ prior with \(\epsilon\) degrees of freedom, which, combined with priors on the standard deviations of each component, induces a prior on the covariance matrix; as \(\epsilon \rightarrow \infty\) the magnitude of correlations between components decreases, whereas \(\epsilon=1\) leads to a uniform prior distribution for \(L\). By default, the hyperparameters are \(\bm{\mu}_0=\bm{0}\), \(\sigma_d=2.5, \epsilon=1\). The user may propose some different values with the argument:
priors=list(mu_0=c(1,2), sigma_d = 4, epsilon =2)
If software="rjags"
the function performs JAGS sampling using the bayesmix
package
for univariate Gaussian mixtures, and the runjags
package for multivariate Gaussian mixtures. If software="rstan"
the function performs
Hamiltonian Monte Carlo (HMC) sampling via the rstan
package (see the vignette and the Stan project
for any help).
After MCMC sampling, this function
clusters the units in k
groups,
calls the piv_sel()
function and yields the
pivots obtained from one among four different
methods (the user may specify one among them via piv.criterion
argument):
"maxsumint"
, "minsumnoint"
, "maxsumdiff"
and "MUS"
(available only if k <= 4
)
(see the vignette for thorough details). Due to computational reasons
clarified in the Details section of the function piv_rel
, the
length of the MCMC chains will be minor or equal than the input
argument nMC
; this length, corresponding to the value
true.iter
returned by the procedure, is the number of
MCMC iterations for which
the number of JAGS/Stan groups exactly coincides with the prespecified
number of groups k
.
Egidi, L., Pappadà, R., Pauli, F. and Torelli, N. (2018). Relabelling in Bayesian Mixture Models by Pivotal Units. Statistics and Computing, 28(4), 957-969.
### Bivariate simulation
if (FALSE) {
N <- 200
k <- 4
D <- 2
nMC <- 1000
M1 <- c(-.5,8)
M2 <- c(25.5,.1)
M3 <- c(49.5,8)
M4 <- c(63.0,.1)
Mu <- rbind(M1,M2,M3,M4)
Sigma.p1 <- diag(D)
Sigma.p2 <- 20*diag(D)
W <- c(0.2,0.8)
sim <- piv_sim(N = N, k = k, Mu = Mu,
Sigma.p1 = Sigma.p1,
Sigma.p2 = Sigma.p2, W = W)
## rjags (default)
res <- piv_MCMC(y = sim$y, k =k, nMC = nMC)
## rstan
res_stan <- piv_MCMC(y = sim$y, k =k, nMC = nMC,
software ="rstan")
# changing priors
res2 <- piv_MCMC(y = sim$y,
priors = list (
mu_0=c(1,1),
S2 = matrix(c(0.002,0,0, 0.1),2,2, byrow=TRUE),
S3 = matrix(c(0.1,0,0,0.1), 2,2, byrow =TRUE)),
k = k, nMC = nMC)
}
### Fishery data (bayesmix package)
if (FALSE) {
library(bayesmix)
data(fish)
y <- fish[,1]
k <- 5
nMC <- 5000
res <- piv_MCMC(y = y, k = k, nMC = nMC)
# changing priors
res2 <- piv_MCMC(y = y,
priors = list(kind = "condconjugate",
parameter = "priorsRaftery",
hierarchical = "tau"), k =k, nMC = nMC)
}
Run the code above in your browser using DataLab