SparkR (version 2.1.2)

spark.gaussianMixture: Multivariate Gaussian Mixture Model (GMM)

Description

Fits multivariate gaussian mixture model against a SparkDataFrame, similarly to R's mvnormalmixEM(). Users can call summary to print a summary of the fitted model, predict to make predictions on new data, and write.ml/read.ml to save/load fitted models.

Usage

spark.gaussianMixture(data, formula, ...)

# S4 method for GaussianMixtureModel,character write.ml(object, path, overwrite = FALSE)

# S4 method for SparkDataFrame,formula spark.gaussianMixture(data, formula, k = 2, maxIter = 100, tol = 0.01)

# S4 method for GaussianMixtureModel summary(object)

# S4 method for GaussianMixtureModel predict(object, newData)

Arguments

data

a SparkDataFrame for training.

formula

a symbolic description of the model to be fitted. Currently only a few formula operators are supported, including '~', '.', ':', '+', and '-'. Note that the response variable of formula is empty in spark.gaussianMixture.

...

additional arguments passed to the method.

object

a fitted gaussian mixture model.

path

the directory where the model is saved.

overwrite

overwrites or not if the output path already exists. Default is FALSE which means throw exception if the output path exists.

k

number of independent Gaussians in the mixture model.

maxIter

maximum iteration number.

tol

the convergence tolerance.

newData

a SparkDataFrame for testing.

Value

spark.gaussianMixture returns a fitted multivariate gaussian mixture model.

summary returns summary of the fitted model, which is a list. The list includes the model's lambda (lambda), mu (mu), sigma (sigma), and posterior (posterior).

predict returns a SparkDataFrame containing predicted labels in a column named "prediction".

See Also

mixtools: https://cran.r-project.org/package=mixtools

predict, read.ml, write.ml

Examples

Run this code
# NOT RUN {
sparkR.session()
library(mvtnorm)
set.seed(100)
a <- rmvnorm(4, c(0, 0))
b <- rmvnorm(6, c(3, 4))
data <- rbind(a, b)
df <- createDataFrame(as.data.frame(data))
model <- spark.gaussianMixture(df, ~ V1 + V2, k = 2)
summary(model)

# fitted values on training data
fitted <- predict(model, df)
head(select(fitted, "V1", "prediction"))

# save fitted model to input path
path <- "path/to/model"
write.ml(model, path)

# can also read back the saved model and print
savedModel <- read.ml(path)
summary(savedModel)
# }

Run the code above in your browser using DataLab