Learn R Programming

RemixAutoML (version 0.11.0)

AutoKMeans: AutoKMeans Automated row clustering for mixed column types

Description

AutoKMeans adds a column to your original data with a cluster number identifier. Uses glrm (grid tune-able) and then k-means to find optimal k.

Usage

AutoKMeans(data, nthreads = 8, MaxMem = "28G", SaveModels = NULL,
  PathFile = NULL, GridTuneGLRM = TRUE, GridTuneKMeans = TRUE,
  glrmCols = c(1:5), IgnoreConstCols = TRUE, glrmFactors = 5,
  Loss = "Absolute", glrmMaxIters = 1000, SVDMethod = "Randomized",
  MaxRunTimeSecs = 3600, KMeansK = 50, KMeansMetric = "totss")

Arguments

data

is the source time series data.table

nthreads

set based on number of threads your machine has available

MaxMem

set based on the amount of memory your machine has available

SaveModels

Set to "standard", "mojo", or NULL (default)

PathFile

Set to folder where you will keep the models

GridTuneGLRM

If you want to grid tune the glrm model, set to TRUE, FALSE otherwise

GridTuneKMeans

If you want to grid tuen the KMeans model, set to TRUE, FALSE otherwise

glrmCols

the column numbers for the glrm

IgnoreConstCols

tell H2O to ignore any columns that have zero variance

glrmFactors

similar to the number of factors to return from PCA

Loss

set to one of "Quadratic", "Absolute", "Huber", "Poisson", "Hinge", "Logistic", "Periodic"

glrmMaxIters

max number of iterations

SVDMethod

choose from "Randomized","GramSVD","Power"

MaxRunTimeSecs

set the timeout for max run time

KMeansK

number of factors to test out in k-means to find the optimal number

KMeansMetric

pick the metric to identify top model in grid tune c("totss","betweenss","withinss")

Value

Original data.table with added column with cluster number identifier

See Also

Other Unsupervised Learning: GenTSAnomVars, ProblematicRecords, ResidualOutliers

Examples

Run this code
# NOT RUN {
# }

Run the code above in your browser using DataLab