pmclust (version 0.2-0)

pmclust and pkmeans: Parallel Model-Based Clustering and Parallel K-means Algorithm

Description

Parallel Model-Based Clustering and Parallel K-means Algorithm

Usage

pmclust(X = NULL, K = 2, MU = NULL,
    algorithm = .PMC.CT$algorithm, RndEM.iter = .PMC.CT$RndEM.iter,
    CONTROL = .PMC.CT$CONTROL, method.own.X = .PMC.CT$method.own.X,
    rank.own.X = .pbd_env$SPMD.CT$rank.source, comm = .pbd_env$SPMD.CT$comm)

pkmeans(X = NULL, K = 2, MU = NULL, algorithm = c("kmeans", "kmeans.dmat"), CONTROL = .PMC.CT$CONTROL, method.own.X = .PMC.CT$method.own.X, rank.own.X = .pbd_env$SPMD.CT$rank.source, comm = .pbd_env$SPMD.CT$comm)

Arguments

X

a GBD row-major matrix or a ddmatrix.

K

number of clusters.

MU

pre-specified centers.

algorithm

types of EM algorithms.

RndEM.iter

number of Rand-EM iterations.

CONTROL

a control for algorithms, see CONTROL for details.

method.own.X

how X is distributed.

rank.own.X

who own X if method.own.X = "single".

comm

MPI communicator.

Value

These functions return a list with class pmclust or pkmeans.

See the help page of PARAM or PARAM.org for details.

Details

These are high-level functions for several functions in pmclust including: data distribution, setting global environment .pmclustEnv, initializations, algorithm selection, etc.

The input X is either in ddmatrix or gbd. It will be converted in gbd row-major format and copied into .pmclustEnv for computation. By default, pmclust uses a GBD row-major format (gbdr). While common means that X is identical on all processors, and single means that X only exist on one processor rank.own.X.

References

Programming with Big Data in R Website: http://r-pbd.org/

See Also

set.global, e.step, m.step. set.global.dmat, e.step.dmat, m.step.dmat.

Examples

Run this code
# NOT RUN {
# Save code in a file "demo.r" and run in 4 processors by
# > mpiexec -np 4 Rscript demo.r

### Setup environment.
library(pmclust, quiet = TRUE)

### Load data
X <- as.matrix(iris[, -5])

### Distribute data
jid <- get.jid(nrow(X))
X.gbd <- X[jid,]

### Standardized
N <- allreduce(nrow(X.gbd))
p <- ncol(X.gbd)
mu <- allreduce(colSums(X.gbd / N))
X.std <- sweep(X.gbd, 2, mu, FUN = "-")
std <- sqrt(allreduce(colSums(X.std^2 / (N - 1))))
X.std <- sweep(X.std, 2, std, FUN = "/")

### Clustering
library(pmclust, quiet = TRUE)
comm.set.seed(123, diff = TRUE)

ret.mb1 <- pmclust(X.std, K = 3)
comm.print(ret.mb1)

ret.kms <- pkmeans(X.std, K = 3)
comm.print(ret.kms)

### Finish
finalize()
# }

Run the code above in your browser using DataCamp Workspace