This function lets the user cluster a whole data.frame automatically. As you might know, the goal of kmeans is to group data points into distinct non-overlapping subgroups. If needed, one hot encoding will be applied to categorical values automatically with this function. For consideration: Scale/standardize the data when applying kmeans. Also, kmeans assumes spherical shapes of clusters and doesn<U+2019>t work well when clusters are in different shapes such as elliptical clusters.
clusterKmeans(df, k = NA, limit = 20, drop_na = TRUE, ohse = TRUE,
norm = TRUE, comb = c(1, 2, 3), seed = 123)
Dataframe
Integer. Number of clusters
Integer. How many clusters should be considered?
Boolean. Should NA rows be removed?
Boolean. Do you wish to automatically run one hot encoding to non-numerical columns?
Boolean. Should the data be normalized?
Vector. Which columns do you wish to plot? Select which two variables by name or column position.
Numeric. Seed for reproducibility
Other Machine Learning: ROC
,
conf_mat
, export_results
,
gain_lift
, h2o_automl
,
h2o_predict_API
,
h2o_predict_MOJO
,
h2o_predict_binary
,
h2o_predict_model
,
h2o_selectmodel
, impute
,
iter_seeds
, model_metrics
,
mplot_conf
, mplot_cuts_error
,
mplot_cuts
, mplot_density
,
mplot_full
, mplot_gain
,
mplot_importance
,
mplot_lineal
, mplot_metrics
,
mplot_response
, mplot_roc
,
mplot_splits
, msplit