Learn R Programming

ClusterR (version 1.0.1)

Clara_Medoids: Clustering large applications

Description

Clustering large applications

Usage

Clara_Medoids(data, clusters, samples, sample_size, distance_metric = "euclidean", minkowski_p = 1, threads = 1, swap_phase = TRUE, fuzzy = FALSE, verbose = FALSE, seed = 1)

Arguments

data
matrix or data frame
clusters
the number of clusters
samples
number of samples to draw from the data set
sample_size
fraction of data to draw in each sample iteration. It should be a float number greater than 0.0 and less or equal to 1.0
distance_metric
a string specifying the distance method. One of, euclidean, manhattan, chebyshev, canberra, braycurtis, pearson_correlation, simple_matching_coefficient, minkowski, hamming, jaccard_coefficient, Rao_coefficient, mahalanobis
minkowski_p
a numeric value specifying the minkowski parameter in case that distance_metric = "minkowski"
threads
an integer specifying the number of cores to run in parallel. Openmp will be utilized to parallelize the number of the different sample draws
swap_phase
either TRUE or FALSE. If TRUE then both phases ('build' and 'swap') will take place. The 'swap_phase' is considered more computationally intensive.
fuzzy
either TRUE or FALSE. If TRUE, then probabilities for each cluster will be returned based on the distance between observations and medoids
verbose
either TRUE or FALSE, indicating whether progress is printed during clustering
seed
integer value for random number generator (RNG)

Value

a list with the following attributes : medoids, medoid_indices, sample_indices, best_dissimilarity, clusters, fuzzy_probs (if fuzzy = TRUE), clustering_stats, dissimilarity_matrix, silhouette_matrix

Details

The Clara_Medoids function is implemented in the same way as the 'clara' (clustering large applications) algorithm (Kaufman and Rousseeuw(1990)). In the 'Clara_Medoids' the 'Cluster_Medoids' function will be applied to each sample draw.

References

Anja Struyf, Mia Hubert, Peter J. Rousseeuw, (Feb. 1997), Clustering in an Object-Oriented Environment, Journal of Statistical Software, Vol 1, Issue 4

Examples

Run this code

data(dietary_survey_IBS)

dat = dietary_survey_IBS[, -ncol(dietary_survey_IBS)]

dat = center_scale(dat)

clm = Clara_Medoids(dat, clusters = 3, samples = 5, sample_size = 0.2, swap_phase = TRUE)

Run the code above in your browser using DataLab