Learn R Programming

lares (version 4.7)

clusterKmeans: K-Means Clustering Automated

Description

This function lets the user cluster a whole data.frame automatically. As you might know, the goal of kmeans is to group data points into distinct non-overlapping subgroups. If needed, one hot encoding will be applied to categorical values automatically with this function. For consideration: Scale/standardize the data when applying kmeans. Also, kmeans assumes spherical shapes of clusters and doesn<U+2019>t work well when clusters are in different shapes such as elliptical clusters.

Usage

clusterKmeans(df, k = NA, limit = 20, drop_na = TRUE, ohse = TRUE,
  norm = TRUE, comb = c(1, 2, 3), seed = 123)

Arguments

df

Dataframe

k

Integer. Number of clusters

limit

Integer. How many clusters should be considered?

drop_na

Boolean. Should NA rows be removed?

ohse

Boolean. Do you wish to automatically run one hot encoding to non-numerical columns?

norm

Boolean. Should the data be normalized?

comb

Vector. Which columns do you wish to plot? Select which two variables by name or column position.

seed

Numeric. Seed for reproducibility

See Also

Other Machine Learning: ROC, conf_mat, export_results, gain_lift, h2o_automl, h2o_predict_API, h2o_predict_MOJO, h2o_predict_binary, h2o_predict_model, h2o_selectmodel, impute, iter_seeds, model_metrics, mplot_conf, mplot_cuts_error, mplot_cuts, mplot_density, mplot_full, mplot_gain, mplot_importance, mplot_lineal, mplot_metrics, mplot_response, mplot_roc, mplot_splits, msplit