Learn R Programming

pmclust (version 0.1-6)

pmclust and pkmeans: Parallel Model-Based Clustering and Parallel K-means Algorithm

Description

Parallel Model-Based Clustering and Parallel K-means Algorithm

Usage

pmclust(X = NULL, K = 2, MU = NULL,
    algorithm = .PMC.CT$algorithm, RndEM.iter = .PMC.CT$RndEM.iter,
    CONTROL = .PMC.CT$CONTROL, method.own.X = .PMC.CT$method.own.X,
    rank.own.X = .SPMD.CT$rank.source, comm = .SPMD.CT$comm)

pkmeans(X = NULL, K = 2, MU = NULL, algorithm = c("kmeans", "kmeans.dmat"), CONTROL = .PMC.CT$CONTROL, method.own.X = .PMC.CT$method.own.X, rank.own.X = .SPMD.CT$rank.source, comm = .SPMD.CT$comm)

Arguments

X
a GBD row-major matrix or a ddmatrix.
K
number of clusters.
MU
pre-specified centers.
algorithm
types of EM algorithms.
RndEM.iter
number of Rand-EM iterations.
CONTROL
a control for algorithms, see CONTROL for details.
method.own.X
how X is distributed.
rank.own.X
who own X if method.own.X = "single".
comm
MPI communicator.

Value

  • These functions return a list with class pmclust or pkmeans.

    See the help page of PARAM or PARAM.org for details.

Details

These are high-level functions for several functions in pmclust including: data distribution, setting global environment .pmclustEnv, initializations, algorithm selection, etc.

The input X is either in ddmatrix or gbd. It will be converted in gbd row-major format and copied into .pmclustEnv for computation. By default, pmclust uses a GBD row-major format (gbdr). While common means that X is identical on all processors, and single means that X only exist on one processor rank.own.X.

References

High Performance Statistical Computing (HPSC) Website: http://thirteen-01.stat.iastate.edu/snoweye/hpsc/

Programming with Big Data in R Website: http://r-pbd.org/

See Also

set.global, e.step, m.step. set.global.dmat, e.step.dmat, m.step.dmat.

Examples

Run this code
# Save code in a file "demo.r" and run in 4 processors by
# > mpiexec -np 4 Rscript demo.r

### Setup environment.
library(pmclust, quiet = TRUE)

### Load data
X <- as.matrix(iris[, -5])

### Distribute data
jid <- get.jid(nrow(X))
X.gbd <- X[jid,]

### Standardized
N <- allreduce(nrow(X.gbd))
p <- ncol(X.gbd)
mu <- allreduce(colSums(X.gbd / N))
X.std <- sweep(X.gbd, 2, mu, FUN = "-")
std <- sqrt(allreduce(colSums(X.std^2 / (N - 1))))
X.std <- sweep(X.std, 2, std, FUN = "/")

### Clustering
library(pmclust, quiet = TRUE)
comm.set.seed(123, diff = TRUE)

ret.mb1 <- pmclust(X.std, K = 3)
comm.print(ret.mb1)

ret.kms <- pkmeans(X.std, K = 3)
comm.print(ret.kms)

### Finish
finalize()

Run the code above in your browser using DataLab