
Fits multivariate gaussian mixture model against a SparkDataFrame, similarly to R's
mvnormalmixEM(). Users can call summary
to print a summary of the fitted model,
predict
to make predictions on new data, and write.ml
/read.ml
to save/load fitted models.
spark.gaussianMixture(data, formula, ...)# S4 method for SparkDataFrame,formula
spark.gaussianMixture(data, formula, k = 2, maxIter = 100, tol = 0.01)
# S4 method for GaussianMixtureModel
summary(object)
# S4 method for GaussianMixtureModel
predict(object, newData)
# S4 method for GaussianMixtureModel,character
write.ml(object, path, overwrite = FALSE)
a SparkDataFrame for training.
a symbolic description of the model to be fitted. Currently only a few formula operators are supported, including '~', '.', ':', '+', and '-'. Note that the response variable of formula is empty in spark.gaussianMixture.
additional arguments passed to the method.
number of independent Gaussians in the mixture model.
maximum iteration number.
the convergence tolerance.
a fitted gaussian mixture model.
a SparkDataFrame for testing.
the directory where the model is saved.
overwrites or not if the output path already exists. Default is FALSE which means throw exception if the output path exists.
spark.gaussianMixture
returns a fitted multivariate gaussian mixture model.
summary
returns summary of the fitted model, which is a list.
The list includes the model's lambda
(lambda), mu
(mu),
sigma
(sigma), loglik
(loglik), and posterior
(posterior).
predict
returns a SparkDataFrame containing predicted labels in a column named
"prediction".
# NOT RUN {
sparkR.session()
library(mvtnorm)
set.seed(100)
a <- rmvnorm(4, c(0, 0))
b <- rmvnorm(6, c(3, 4))
data <- rbind(a, b)
df <- createDataFrame(as.data.frame(data))
model <- spark.gaussianMixture(df, ~ V1 + V2, k = 2)
summary(model)
# fitted values on training data
fitted <- predict(model, df)
head(select(fitted, "V1", "prediction"))
# save fitted model to input path
path <- "path/to/model"
write.ml(model, path)
# can also read back the saved model and print
savedModel <- read.ml(path)
summary(savedModel)
# }
Run the code above in your browser using DataLab