pmclust (version 0.2-0)

generate.MixSim: Generate MixSim Examples for Testing

Description

This function utilizes MixSim to generate sets of data for testing algorithms.

Usage

generate.MixSim(N, p, K, MixSim.obj = NULL, MaxOmega = NULL,
                  BarOmega = NULL, PiLow = 1.0, sph = FALSE, hom = FALSE)

Arguments

N

total sample size across all \(S\) processors, i.e. sum over N.spmd is N.

p

dimension of data X.spmd, i.e. ncol(X.spmd).

K

number of clusters.

MixSim.obj

an object returned from MixSim.

MaxOmega

maximum overlap as in MixSim.

BarOmega

averaged overlap as in MixSim.

PiLow

lower bound of mixture proportion as in MixSim.

sph

sph as in MixSim.

hom

hom as in MixSim.

Value

A set of simulated data and information will be returned in a list variable including:

K number of clusters, as the input
p dimension of data X.spmd, as the input
N total sample size, as the input
N.allspmds a collection of sample sizes for all \(S\) processors, as the input
N.spmd total sample size of given processor, as the input
X.spmd generated data set with dimension with dimension N.spmd * p
CLASS.spmd true id of each data, a vector of length N.spmd and has values from 1 to K
N.CLASS.spmd true sample size of each clusters, a vector of length K

Details

If MixSim.obj is NULL, then BarOmega and MaxOmega will be used in MixSim to obtain a new MixSim.obj.

References

Melnykov, V., Chen, W.-C. and Maitra, R. (2012) “MixSim: Simulating Data to Study Performance of Clustering Algorithms”, Journal of Statistical Software, (accepted).

Programming with Big Data in R Website: http://r-pbd.org/

See Also

generate.basic.

Examples

Run this code
# NOT RUN {
# Save code in a file "demo.r" and run in 4 processors by
# > mpiexec -np 4 Rscript demo.r

### Setup environment.
library(pmclust, quiet = TRUE)

### Generate an example data.
N <- 5000
p <- 2
K <- 2
data.spmd <- generate.MixSim(N, p, K, BarOmega = 0.01)
X.spmd <- data.spmd$X.spmd

### Run clustering.
PARAM.org <- set.global(K = K)          # Set global storages.
# PARAM.org <- initial.em(PARAM.org)    # One initial.
PARAM.org <- initial.RndEM(PARAM.org)   # Ten initials by default.
PARAM.new <- apecma.step(PARAM.org)     # Run APECMa.
em.update.class()                       # Get classification.

### Get results.
N.CLASS <- get.N.CLASS(K)
comm.cat("# of class:", N.CLASS, "\n")
comm.cat("# of class (true):", data.spmd$N.CLASS.spmd, "\n")

### Quit.
finalize()
# }

Run the code above in your browser using DataCamp Workspace