AutoKMeans adds a column to your original data with a cluster number identifier. Uses glrm (grid tune-able) and then k-means to find optimal k.
AutoKMeans(data, nthreads = 8, MaxMem = "28G", SaveModels = NULL,
PathFile = NULL, GridTuneGLRM = TRUE, GridTuneKMeans = TRUE,
glrmCols = c(1:5), IgnoreConstCols = TRUE, glrmFactors = 5,
Loss = "Absolute", glrmMaxIters = 1000, SVDMethod = "Randomized",
MaxRunTimeSecs = 3600, KMeansK = 50, KMeansMetric = "totss")
is the source time series data.table
set based on number of threads your machine has available
set based on the amount of memory your machine has available
Set to "standard", "mojo", or NULL (default)
Set to folder where you will keep the models
If you want to grid tune the glrm model, set to TRUE, FALSE otherwise
If you want to grid tuen the KMeans model, set to TRUE, FALSE otherwise
the column numbers for the glrm
tell H2O to ignore any columns that have zero variance
similar to the number of factors to return from PCA
set to one of "Quadratic", "Absolute", "Huber", "Poisson", "Hinge", "Logistic", "Periodic"
max number of iterations
choose from "Randomized","GramSVD","Power"
set the timeout for max run time
number of factors to test out in k-means to find the optimal number
pick the metric to identify top model in grid tune c("totss","betweenss","withinss")
Original data.table with added column with cluster number identifier
Other Unsupervised Learning: GenTSAnomVars
,
ProblematicRecords
,
ResidualOutliers