Learn R Programming

batchmix (version 2.2.1)

batchSemiSupervisedMixtureModel: Batch semisupervised mixture model

Description

A Bayesian mixture model with batch effects.

Usage

batchSemiSupervisedMixtureModel(
  X,
  R,
  thin,
  initial_labels,
  fixed,
  batch_vec,
  type,
  K_max = length(unique(initial_labels)),
  alpha = NULL,
  concentration = NULL,
  mu_proposal_window = 0.5^2,
  cov_proposal_window = 0.002,
  m_proposal_window = 0.3^2,
  S_proposal_window = 0.01,
  t_df_proposal_window = 0.015,
  m_scale = NULL,
  rho = 3,
  theta = 1,
  initial_class_means = NULL,
  initial_class_covariance = NULL,
  initial_batch_shift = NULL,
  initial_batch_scale = NULL,
  initial_class_df = NULL,
  verbose = TRUE
)

Value

A named list containing the sampled partitions, cluster and batch parameters, model fit measures and some details on the model call.

Arguments

X

Data to cluster as a matrix with the items to cluster held in rows.

R

The number of iterations in the sampler.

thin

The factor by which the samples generated are thinned, e.g. if ``thin=50`` only every 50th sample is kept.

initial_labels

Initial clustering.

fixed

Which items are fixed in their initial label.

batch_vec

Labels identifying which batch each item being clustered is from.

type

Character indicating density type to use. One of 'MVN' (multivariate normal distribution) or 'MVT' (multivariate t distribution).

K_max

The number of components to include (the upper bound on the number of clusters in each sample). Defaults to the number of unique labels in ``initial_labels``.

alpha

The concentration parameter for the stick-breaking prior and the weights in the model.

concentration

Initial concentration vector for component weights.

mu_proposal_window

The proposal window for the cluster mean proposal kernel. The proposal density is a Gaussian distribution, the window is the variance.

cov_proposal_window

The proposal window for the cluster covariance proposal kernel. The proposal density is a Wishart distribution, this argument is the reciprocal of the degree of freedom.

m_proposal_window

The proposal window for the batch mean proposal kernel. The proposal density is a Gaussian distribution, the window is the variance.

S_proposal_window

The proposal window for the batch standard deviation proposal kernel. The proposal density is a Gamma distribution, this argument is the reciprocal of the rate.

t_df_proposal_window

The proposal window for the degrees of freedom for the multivariate t distribution (not used if type is not 'MVT'). The proposal density is a Gamma distribution, this argument is the reciprocal of the rate.

m_scale

The scale hyperparameter for the batch shift prior distribution. This defines the scale of the batch effect upon the mean and should be in (0, 1].

rho

The shape of the prior distribution for the batch scale.

theta

The scale of the prior distribution for the batch scale.

initial_class_means

A $P x K$ matrix of initial values for the class means. Defaults to draws from the prior distribution.

initial_class_covariance

A $P x P x K$ array of initial values for the class covariance matrices. Defaults to draws from the prior distribution.

initial_batch_shift

A $P x B$ matrix of initial values for the batch shift effect Defaults to draws from the prior distribution.

initial_batch_scale

A $P x B$ matrix of initial values for the batch scales Defaults to draws from the prior distribution.

initial_class_df

A $K$ vector of initial values for the class degrees of freedom. Defaults to draws from the prior distribution.

verbose

Logiccal indicating if warning about proposal windows should be printed.

Examples

Run this code

# Data in a matrix format
X <- matrix(c(rnorm(100, 0, 1), rnorm(100, 3, 1)), ncol = 2, byrow = TRUE)

# Initial labelling
labels <- c(
  rep(1, 10),
  sample(c(1, 2), size = 40, replace = TRUE),
  rep(2, 10),
  sample(c(1, 2), size = 40, replace = TRUE)
)

fixed <- c(rep(1, 10), rep(0, 40), rep(1, 10), rep(0, 40))

# Batch
batch_vec <- sample(seq(1, 5), replace = TRUE, size = 100)

# Density choice
type <- "MVN"

# Sampling parameters
R <- 1000
thin <- 50

# MCMC samples and BIC vector
samples <- batchSemiSupervisedMixtureModel(
  X,
  R,
  thin,
  labels,
  fixed,
  batch_vec,
  type
)

# Given an initial value for the parameters
initial_class_means <- matrix(c(1, 1, 3, 4), nrow = 2)
initial_class_covariance <- array(c(1, 0, 0, 1, 1, 0, 0, 1),
  dim = c(2, 2, 2)
)

# We can use values from a previous chain
initial_batch_shift <- samples$batch_shift[, , R / thin]
initial_batch_scale <- matrix(
  c(1.2, 1.3, 1.7, 1.1, 1.4, 1.3, 1.2, 1.2, 1.1, 2.0),
  nrow = 2
)

samples <- batchSemiSupervisedMixtureModel(X,
  R,
  thin,
  labels,
  fixed,
  batch_vec,
  type,
  initial_class_means = initial_class_means,
  initial_class_covariance = initial_class_covariance,
  initial_batch_shift = initial_batch_shift,
  initial_batch_scale = initial_batch_scale
)

Run the code above in your browser using DataLab