pp_mixture.brmsfit: Posterior Probabilities of Mixture Component Memberships

Description

Compute the posterior probabilities of mixture component memberships for each observation including uncertainty estimates.

Usage

# S3 method for brmsfit
pp_mixture(x, newdata = NULL, re_formula = NULL,
  allow_new_levels = FALSE, sample_new_levels = "uncertainty",
  new_objects = list(), incl_autocor = TRUE, subset = NULL,
  nsamples = NULL, nug = NULL, summary = TRUE, robust = FALSE,
  probs = c(0.025, 0.975), log = FALSE, ...)
pp_mixture(x, ...)

Arguments

An R object usually of class brmsfit.

newdata

An optional data.frame for which to evaluate predictions. If NULL (default), the orginal data of the model is used.

re_formula

formula containing group-level effects to be considered in the prediction. If NULL (default), include all group-level effects; if NA, include no group-level effects.

allow_new_levels

A flag indicating if new levels of group-level effects are allowed (defaults to FALSE). Only relevant if newdata is provided.

sample_new_levels

Indicates how to sample new levels for grouping factors specified in re_formula. This argument is only relevant if newdata is provided and allow_new_levels is set to TRUE. If "uncertainty" (default), include group-level uncertainty in the predictions based on the variation of the existing levels. If "gaussian", sample new levels from the (multivariate) normal distribution implied by the group-level standard deviations and correlations. This options may be useful for conducting Bayesian power analysis. If "old_levels", directly sample new levels from the existing levels.

new_objects

A named list of objects containing new data, which cannot be passed via argument newdata. Currently, only required for objects passed to cor_sar and cor_fixed.

incl_autocor

A flag indicating if ARMA autocorrelation parameters should be included in the predictions. Defaults to TRUE. Setting it to FALSE will not affect other correlation structures such as cor_bsts, or cor_fixed.

subset

A numeric vector specifying the posterior samples to be used. If NULL (the default), all samples are used.

nsamples

Positive integer indicating how many posterior samples should be used. If NULL (the default) all samples are used. Ignored if subset is not NULL.

nug

Small positive number for Gaussian process terms only. For numerical reasons, the covariance matrix of a Gaussian process might not be positive definite. Adding a very small number to the matrix's diagonal often solves this problem. If NULL (the default), nug is chosen internally.

summary

Should summary statistics (i.e. means, sds, and 95% intervals) be returned instead of the raw values? Default is TRUE.

robust

If FALSE (the default) the mean is used as the measure of central tendency and the standard deviation as the measure of variability. If TRUE, the median and the median absolute deivation (MAD) are applied instead. Only used if summary is TRUE.

probs

The percentiles to be computed by the quantile function. Only used if summary is TRUE.

log

Logical; Indicates whether to return probabilities on the log-scale.

...

Currently ignored.

Value

If summary = TRUE, an N x E x K array, where N is the number of observations, K is the number of mixture components, and E is equal to length(probs) + 2. If summary = FALSE, an S x N x K arrary, where S is the number of posterior samples.

Details

The returned probabilities can be written as $P(Kn = k | Yn)$, that is the posterior probability that observation n orginiates from component k. They are computed using Bayes' Theorem $$P(Kn = k | Yn) = P(Yn | Kn = k) P(Kn = k) / P(Yn),$$ where $P(Yn | Kn = k)$ is the (posterior) likelihood of observation n for component k, $P(Kn = k)$ is the (posterior) mixing probability of component k (i.e. parameter theta<k>), and $$P(Yn) = \sum (k=1,...,K) P(Yn | Kn = k) P(Kn = k)$$ is a normalizing constant.

Examples

Run this code

# NOT RUN {
## simulate some data
set.seed(1234)
dat <- data.frame(
  y = c(rnorm(100), rnorm(50, 2)), 
  x = rnorm(150)
)
## fit a simple normal mixture model
mix <- mixture(gaussian, nmix = 2)
prior <- c(
  prior(normal(0, 5), Intercept, nlpar = mu1),
  prior(normal(0, 5), Intercept, nlpar = mu2),
  prior(dirichlet(2, 2), theta)
)
fit1 <- brm(bf(y ~ x), dat, family = mix,
            prior = prior, chains = 2, inits = 0)
summary(fit1)
   
## compute the membership probabilities         
ppm <- pp_mixture(fit1)
str(ppm)

## extract point estimates for each observation
head(ppm[, 1, ])

## classify every observation according to 
## the most likely component
apply(ppm[, 1, ], 1, which.max)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab