Learn R Programming

⚠️There's a newer version (1.3.6) of this package.Take me there.

ClusterR

The ClusterR package consists of Gaussian mixture models, k-means, mini-batch-kmeans, k-medoids and affinity propagation clustering algorithms with the option to plot, validate, predict (new data) and find the optimal number of clusters. The package takes advantage of 'RcppArmadillo' to speed up the computationally intensive parts of the functions. More details on the functionality of ClusterR can be found in the Vignette and in the package Documentation.

UPDATE 16-08-2018

As of version 1.1.4 the ClusterR package allows R package maintainers to perform linking between packages at a C++ code (Rcpp) level. This means that the Rcpp functions of the ClusterR package can be called in the C++ files of another package. In the next lines I'll give detailed explanations on how this can be done:

Assumming that an R package ('PackageA') calls one of the ClusterR Rcpp functions. Then the maintainer of 'PackageA' has to :

  • 1st. install the ClusterR package to take advantage of the new functionality either from CRAN using,

install.packages("ClusterR")
 

or download the latest version from Github using the devtools package,


devtools::install_github('mlampros/ClusterR')
 
  • 2nd. update the DESCRIPTION file of 'PackageA' and especially the LinkingTo field by adding the ClusterR package (besides any other packages),

LinkingTo: ClusterR
  • 3rd. open a new C++ file (for instance in Rstudio) and at the top of the file add the following 'headers', 'depends' and 'plugins',

# include <RcppArmadillo.h>
# include <ClusterRHeader.h>
# include <affinity_propagation.h>
// [[Rcpp::depends("RcppArmadillo")]]
// [[Rcpp::depends(ClusterR)]]
// [[Rcpp::plugins(cpp11)]]

The available functions can be found in the following files: inst/include/ClusterRHeader.h and inst/include/affinity_propagation.h

A complete minimal example would be :

# include <RcppArmadillo.h>
# include <ClusterRHeader.h>
# include <affinity_propagation.h>
// [[Rcpp::depends("RcppArmadillo")]]
// [[Rcpp::depends(ClusterR)]]
// [[Rcpp::plugins(cpp11)]]


using namespace clustR;


// [[Rcpp::export]]
Rcpp::List mini_batch_kmeans(arma::mat& data, int clusters, int batch_size, int max_iters, int num_init = 1, 

                            double init_fraction = 1.0, std::string initializer = "kmeans++",
                            
                            int early_stop_iter = 10, bool verbose = false, 
                            
                            Rcpp::Nullable<Rcpp::NumericMatrix> CENTROIDS = R_NilValue, 
                            
                            double tol = 1e-4, double tol_optimal_init = 0.5, int seed = 1) {

  ClustHeader clust_header;

  return clust_header.mini_batch_kmeans(data, clusters, batch_size, max_iters, num_init, init_fraction, 
  
                                        initializer, early_stop_iter, verbose, CENTROIDS, tol, 
                                        
                                        tol_optimal_init, seed);
}

Then, by opening an R file a user can call the mini_batch_kmeans function using,


Rcpp::sourceCpp('example.cpp')              # assuming that the previous Rcpp code is included in 'example.cpp' 
             
set.seed(1)
dat = matrix(runif(100000), nrow = 1000, ncol = 100)

mbkm = mini_batch_kmeans(dat, clusters = 3, batch_size = 50, max_iters = 100, num_init = 2, 

                         init_fraction = 1.0, initializer = "kmeans++", early_stop_iter = 10, 
                         
                         verbose = T, CENTROIDS = NULL, tol = 1e-4, tol_optimal_init = 0.5, seed = 1)
                         
str(mbkm)

Use the following link to report bugs/issues,

https://github.com/mlampros/ClusterR/issues

Copy Link

Version

Install

install.packages('ClusterR')

Monthly Downloads

7,218

Version

1.1.8

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Lampros Mouselimis

Last Published

January 11th, 2019

Functions in ClusterR (1.1.8)

distance_matrix

Distance matrix calculation
Optimal_Clusters_Medoids

Optimal number of Clusters for the partitioning around Medoids functions
predict_MBatchKMeans

Prediction function for Mini-Batch-k-means
predict_Medoids

Predictions for the Medoid functions
entropy_formula

entropy formula (used in external_validation function)
plot_2d

2-dimensional plots
predict_KMeans

Prediction function for the k-means
KMeans_arma

k-means using the Armadillo library
predict_GMM

Prediction function for a Gaussian Mixture Model object
mushroom

The mushroom data
external_validation

external clustering validation
function_interactive

Interactive function for consecutive plots ( using dissimilarities or the silhouette widths of the observations )
Silhouette_Dissimilarity_Plot

Plot of silhouette widths or dissimilarities
soybean

The soybean (large) data set from the UCI repository
tryCatch_GMM

tryCatch function to prevent armadillo errors
tryCatch_optimal_clust_GMM

tryCatch function to prevent armadillo errors in GMM_arma_AIC_BIC
tryCatch_KMEANS_arma

tryCatch function to prevent armadillo errors in KMEANS_arma
center_scale

Function to scale and/or center the data
dietary_survey_IBS

Synthetic data using a dietary survey of patients with irritable bowel syndrome (IBS)
AP_affinity_propagation

Affinity propagation clustering
KMeans_rcpp

k-means using RcppArmadillo
MiniBatchKmeans

Mini-batch-k-means using RcppArmadillo
AP_preferenceRange

Affinity propagation preference range
Clara_Medoids

Clustering large applications
Cluster_Medoids

Partitioning around medoids
Optimal_Clusters_GMM

Optimal number of Clusters for the gaussian mixture models
Optimal_Clusters_KMeans

Optimal number of Clusters for Kmeans or Mini-Batch-Kmeans
GMM

Gaussian Mixture Model clustering