EMCC (version 1.3)

evolMonteCarloClustering: evolutionary Monte Carlo clustering algorithm

Description

Given a possibly multi-modal and multi-dimensional clustering target density function and a temperature ladder this function produces samples from the target using the evolutionary Monte Carlo clustering (EMCC) algorithm.

Below sampDim refers to the dimension of the sample space, temperLadderLen refers to the length of the temperature ladder, and levelsSaveSampForLen refers to the length of the levelsSaveSampFor.

Usage

evolMonteCarloClustering(nIters,              
                         temperLadder,              
                         startingVals,              
                         logTarDensFunc,            
                         MHMergeProb       = 0.5,  
                         moveProbsList     = NULL,  
                         moveNTimesList    = NULL,  
                         levelsSaveSampFor = NULL,  
                         saveFitness       = FALSE, 
                         verboseLevel      = 0,     
                         …)

Arguments

nIters
integer \(>\) 0.
temperLadder
double vector with all positive entries, in decreasing order.
startingVals
double matrix of dimension temperLadderLen \(\times\) sampDim or vector of length sampDim, in which case the same starting values are used for every temperature level.
logTarDensFunc
function of two arguments (draw, …) that returns the target density evaluated in the log scale.
MHMergeProb
double in (0, 1). See details below for the use of this argument.
moveProbsList
named list of probabilities adding upto 1.
moveNTimesList
named list of integers \(\ge\) 0.
levelsSaveSampFor
integer vector with positive entries.
saveFitness
logical.
verboseLevel
integer, a value \(\ge\) 2 produces a lot of output.
optional arguments to be passed to logTarDensFunc.

Value

This function returns a list with the following components:

draws
array of dimension nIters \(\times\) sampDim \(\times\) levelsSaveSampForLen, if saveFitness = FALSE. If saveFitness = TRUE, then the returned array is of dimension nIters \(\times\) (sampDim + 1) \(\times\) levelsSaveSampForLen; i.e., each of the levelsSaveSampForLen matrices contain the fitness values in their last column.

acceptRatios
matrix of the acceptance rates for various moves used.

detailedAcceptRatios
list of matrices with detailed summary of the acceptance rates for various moves used.

nIters
the nIters argument.

temperLadder
the temperLadder argument.

startingVals
the startingVals argument.

moveProbsList
the moveProbsList argument.

moveNTimesList
the moveNTimesList argument.

levelsSaveSampFor
the levelsSaveSampFor argument.

time
the time taken by the run.

Details

The EMCC algorithm
The evolutionary Monte Carlo clustering (EMCC; Goswami and Liu, 2007) algorithm is composed of the following moves:

rl MH Metropolis-Hastings or mutation

SCSC_ONE_NEW sub-cluster swap crossover: one new

SCSC_TWO_NEW sub-cluster swap crossover: two new

SCRC sub-cluster reallocation crossover

RE (random) exchange

The current function could be used to run the EMCC algorithm by specifying what moves to employ using the following variables.

moveProbsList and moveNTimesList
The allowed names for components of moveProbsList and moveNTimesList come from the abbreviated names of the moves above. For example, the following specifications are valid:

moveProbsList = list(MH           = 0.5,
                     SCSC_TWO_NEW = 0.25,
                     SCRC         = 0.25)
      

moveNTimesList = list(MH           = 1, 
                      SCSC_TWO_NEW = floor(temperLadderLen / 2),
                      SCRC         = floor(temperLadderLen / 2),
                      RE           = temperLadderLen)           
      

MHMergeProb
In the MH or the mutation step, each of the sampDim-many objects are proposed to either merge with an existing cluster or split to form its own cluster with probability MHMergeProb and (1 - MHMergeProb), respectively (see Goswami and Liu, 2007).

levelsSaveSampFor
By default, samples are saved and returned for temperature level temperLadderLen. The levelsSaveSampFor could be used to save samples from other temperature levels as well (e.g., levelsSaveSampFor = 1:temperLadderLen saves samples from all levels).

saveFitness
The term fitness refers to the function \(H(x)\), where the target density of interest is given by:

$$g(x) \propto \exp[ -H(x) / \tau_{min} ]$$

\(H(x)\) is also known as the energy function. By default, the fitness values are not saved, but one can do so by setting saveFitness = TRUE.

References

Gopi Goswami, Jun S. Liu and Wing H. Wong (2007). Evolutionary Monte Carlo Methods for Clustering. Journal of Computational and Graphical Statistics, 16:4:855-876.

Examples

Run this code
## The following example is a simple stochastic optimization problem,
## the set up is same as that of findMaxTemper and placeTempers. Here
## no "heating up" is necessary, and hence the maximum temprature is
## the coldest one, namely, 0.5.
##
## However, we run evolMonteCarloClustering on this example with a
## temperature ladder that is the output of placeTempers, which
## assumes that the maximum temperature is 5.
KMeansObj  <- KMeansFuncGenerator1(-97531)
samplerObj <-
    with(KMeansObj,
     {
         temperLadder    <- c(5.0000000, 1.5593974, 1.1028349, 0.9220684,
                              0.7900778, 0.6496648, 0.5135825, 0.5000000)
         nLevels         <- length(temperLadder)
         sampDim         <- nrow(yy)
         startingVals    <- sample(c(0, 1),
                                   size    = nLevels * sampDim,
                                   replace = TRUE)
         startingVals    <- matrix(startingVals, nrow = nLevels, ncol = sampDim)
         moveProbsList   <- list(MH             = 0.4,
                                 RC             = 0.3,
                                 'SCSC_TWO_NEW' = 0.3)
         mm              <- floor(nLevels / 2)
         moveNTimesList  <- list(MH             = 1,
                                 RC             = mm,
                                 'SCSC_TWO_NEW' = mm,
                                 RE             = nLevels)
         evolMonteCarloClustering(nIters            = 100,
                                  temperLadder      = temperLadder,
                                  startingVals      = startingVals,
                                  logTarDensFunc    = logTarDensFunc,
                                  moveProbsList     = moveProbsList,
                                  moveNTimesList    = moveNTimesList,
                                  levelsSaveSampFor = seq_len(nLevels),
                                  saveFitness       = TRUE,
                                  verboseLevel      = 1)
     })
print(samplerObj)
print(names(samplerObj))
with(c(samplerObj, KMeansObj),
 {
     print(acceptRatios)
     print(detailedAcceptRatios)
     print(dim(draws))
     fitnessCol <- ncol(draws[ , , 1])     
     sub        <- paste('uniform prior on # of clusters: DU[',
                         priorMinClusters, ', ',
                         priorMaxClusters, ']', sep = '')
     for (ii in rev(seq_along(levelsSaveSampFor))) {
         main <- paste('EMCC (MAP) clustering (temper = ',
                       round(temperLadder[levelsSaveSampFor[ii]], 3), ')',
                       sep = '')
         MAPRow <- which.min(draws[ , fitnessCol, ii])
         clusterPlot(clusterInd        = draws[MAPRow, -fitnessCol, ii],
                     data              = yy,
                     main              = main,
                     sub               = sub,
                     knownClusterMeans = knownClusterMeans)
     }
 })

Run the code above in your browser using DataCamp Workspace