evolMonteCarloClustering: evolutionary Monte Carlo clustering algorithm

Description

Given a possibly multi-modal and multi-dimensional clustering target density function and a temperature ladder this function produces samples from the target using the evolutionary Monte Carlo clustering (EMCC) algorithm.

Below sampDim refers to the dimension of the sample space, temperLadderLen refers to the length of the temperature ladder, and levelsSaveSampForLen refers to the length of the levelsSaveSampFor.

Usage

evolMonteCarloClustering(nIters,              
                         temperLadder,              
                         startingVals,              
                         logTarDensFunc,            
                         MHMergeProb       = 0.5,  
                         moveProbsList     = NULL,  
                         moveNTimesList    = NULL,  
                         levelsSaveSampFor = NULL,  
                         saveFitness       = FALSE, 
                         verboseLevel      = 0,     
                         …)

Arguments

nIters

integer \(>\) 0.

temperLadder

double vector with all positive entries, in decreasing order.

startingVals

double matrix of dimension temperLadderLen \(\times\) sampDim or vector of length sampDim, in which case the same starting values are used for every temperature level.

logTarDensFunc

function of two arguments (draw, …) that returns the target density evaluated in the log scale.

MHMergeProb

double in (0, 1). See details below for the use of this argument.

moveProbsList

named list of probabilities adding upto 1.

moveNTimesList

named list of integers \(\ge\) 0.

levelsSaveSampFor

integer vector with positive entries.

saveFitness

logical.

verboseLevel

integer, a value \(\ge\) 2 produces a lot of output.

…

optional arguments to be passed to logTarDensFunc.

Value

This function returns a list with the following components:

draws

array of dimension nIters \(\times\) sampDim \(\times\) levelsSaveSampForLen, if saveFitness = FALSE. If saveFitness = TRUE, then the returned array is of dimension nIters \(\times\) (sampDim + 1) \(\times\) levelsSaveSampForLen; i.e., each of the levelsSaveSampForLen matrices contain the fitness values in their last column.

acceptRatios

matrix of the acceptance rates for various moves used.

detailedAcceptRatios

list of matrices with detailed summary of the acceptance rates for various moves used.

nIters

the nIters argument.

temperLadder

the temperLadder argument.

startingVals

the startingVals argument.

moveProbsList

the moveProbsList argument.

moveNTimesList

the moveNTimesList argument.

levelsSaveSampFor

the levelsSaveSampFor argument.

time

the time taken by the run.

Details

The EMCC algorithm: The evolutionary Monte Carlo clustering (EMCC; Goswami and Liu, 2007) algorithm is composed of the following moves:
moveProbsList and moveNTimesList: The allowed names for components of moveProbsList and moveNTimesList come from the abbreviated names of the moves above. For example, the following specifications are valid:
MHMergeProb: In the MH or the mutation step, each of the sampDim-many objects are proposed to either merge with an existing cluster or split to form its own cluster with probability MHMergeProb and (1 - MHMergeProb), respectively (see Goswami and Liu, 2007).
levelsSaveSampFor: By default, samples are saved and returned for temperature level temperLadderLen. The levelsSaveSampFor could be used to save samples from other temperature levels as well (e.g., levelsSaveSampFor = 1:temperLadderLen saves samples from all levels).
saveFitness: The term fitness refers to the function \(H(x)\), where the target density of interest is given by:

References

Gopi Goswami, Jun S. Liu and Wing H. Wong (2007). Evolutionary Monte Carlo Methods for Clustering. Journal of Computational and Graphical Statistics, 16:4:855-876.

Examples

Run this code

## The following example is a simple stochastic optimization problem,
## the set up is same as that of findMaxTemper and placeTempers. Here
## no "heating up" is necessary, and hence the maximum temprature is
## the coldest one, namely, 0.5.
##
## However, we run evolMonteCarloClustering on this example with a
## temperature ladder that is the output of placeTempers, which
## assumes that the maximum temperature is 5.
KMeansObj  <- KMeansFuncGenerator1(-97531)
samplerObj <-
    with(KMeansObj,
     {
         temperLadder    <- c(5.0000000, 1.5593974, 1.1028349, 0.9220684,
                              0.7900778, 0.6496648, 0.5135825, 0.5000000)
         nLevels         <- length(temperLadder)
         sampDim         <- nrow(yy)
         startingVals    <- sample(c(0, 1),
                                   size    = nLevels * sampDim,
                                   replace = TRUE)
         startingVals    <- matrix(startingVals, nrow = nLevels, ncol = sampDim)
         moveProbsList   <- list(MH             = 0.4,
                                 RC             = 0.3,
                                 'SCSC_TWO_NEW' = 0.3)
         mm              <- floor(nLevels / 2)
         moveNTimesList  <- list(MH             = 1,
                                 RC             = mm,
                                 'SCSC_TWO_NEW' = mm,
                                 RE             = nLevels)
         evolMonteCarloClustering(nIters            = 100,
                                  temperLadder      = temperLadder,
                                  startingVals      = startingVals,
                                  logTarDensFunc    = logTarDensFunc,
                                  moveProbsList     = moveProbsList,
                                  moveNTimesList    = moveNTimesList,
                                  levelsSaveSampFor = seq_len(nLevels),
                                  saveFitness       = TRUE,
                                  verboseLevel      = 1)
     })
print(samplerObj)
print(names(samplerObj))
with(c(samplerObj, KMeansObj),
 {
     print(acceptRatios)
     print(detailedAcceptRatios)
     print(dim(draws))
     fitnessCol <- ncol(draws[ , , 1])     
     sub        <- paste('uniform prior on # of clusters: DU[',
                         priorMinClusters, ', ',
                         priorMaxClusters, ']', sep = '')
     for (ii in rev(seq_along(levelsSaveSampFor))) {
         main <- paste('EMCC (MAP) clustering (temper = ',
                       round(temperLadder[levelsSaveSampFor[ii]], 3), ')',
                       sep = '')
         MAPRow <- which.min(draws[ , fitnessCol, ii])
         clusterPlot(clusterInd        = draws[MAPRow, -fitnessCol, ii],
                     data              = yy,
                     main              = main,
                     sub               = sub,
                     knownClusterMeans = knownClusterMeans)
     }
 })

Run the code above in your browser using DataLab