generate.MixSim: Generate MixSim Examples for Testing

Description

This function utilizes MixSim to generate sets of data for testing algorithms.

Usage

generate.MixSim(N, p, K, MixSim.obj = NULL, BarOmega = NULL,
                  MaxOmega = NULL, sph = FALSE, hom = FALSE)

Arguments

total sample size across all $S$ processors, i.e. sum over N.spmd is N.

dimension of data X.spmd, i.e. ncol(X.spmd).

number of clusters.

MixSim.obj

an object returned from MixSim.

BarOmega

averaged overlap as in MixSim.

MaxOmega

maximum overlap as in MixSim.

sph

sph as in MixSim.

hom

hom as in MixSim.

Value

A set of simulated data and information will be returned in a list variable including: ll{K number of clusters, as the input p dimension of data X.spmd, as the input N total sample size, as the input N.allspmds a collection of sample sizes for all $S$ processors, as the input N.spmd total sample size of given processor, as the input X.spmd generated data set with dimension with dimension N.spmd * p CLASS.spmd true id of each data, a vector of length N.spmd and has values from 1 to K N.CLASS.spmd true sample size of each clusters, a vector of length K MixSim.obj the true model where data X.spmd generated from }

Details

If MixSim.obj is NULL, then BarOmega and MaxOmega will be used in MixSim to obtain a new MixSim.obj.

References

Melnykov, V., Chen, W.-C. and Maitra, R. (2012) MixSim: Simulating Data to Study Performance of Clustering Algorithms, Journal of Statistical Software, (accepted).

High Performance Statistical Computing (HPSC) Website: http://thirteen-01.stat.iastate.edu/snoweye/hpsc/

Programming with Big Data in R Website: http://r-pbd.org/

Examples

Run this code

# Save code in a file "demo.r" and run in 4 processors by
# > mpiexec -np 4 Rscript demo.r

### Setup environment.
library(pmclust, quiet = TRUE)

### Generate an example data.
N <- 5000
p <- 2
K <- 2
data.spmd <- generate.MixSim(N, p, K, BarOmega = 0.01)
X.spmd <- data.spmd$X.spmd

### Run clustering.
PARAM.org <- set.global(K = K)          # Set global storages.
# PARAM.org <- initial.em(PARAM.org)    # One initial.
PARAM.org <- initial.RndEM(PARAM.org)   # Ten initials by default.
PARAM.new <- apecma.step(PARAM.org)     # Run APECMa.
em.update.class()                       # Get classification.

### Get results.
N.CLASS <- get.N.CLASS(K)
comm.cat("# of class:", N.CLASS, "\n")
comm.cat("# of class (true):", data.spmd$N.CLASS.spmd, "\n")

### Quit.
finalize()

Run the code above in your browser using DataLab