Discovering the behavior of attributes in a set of clustering packages based on evaluation metrics.
clustering(
path = NULL,
df = NULL,
packages = NULL,
algorithm = NULL,
min = 3,
max = 4,
metrics = NULL,
attributes = FALSE
)
The path of file. NULL
It is only allowed to use path or
df but not both at the same time. Only files in .dat, .csv or arff format are
allowed.
data matrix or data frame, or dissimilarity matrix. NULL
If
you want to use training and test basketball
attributes.
character vector with the packets running the algorithm.
NULL
The seven packages implemented are: cluster, ClusterR, advclust,
amap, apcluster, pvclust. By default runs all packages.
character vector with the algorithms implemented within the
package. NULL
The algorithms implemented are: fuzzy_cm,fuzzy_gg,
fuzzy_gk,hclust,apclusterK,agnes,clara,daisy, diana,fanny,mona,pam,gmm,
kmeans_arma,kmeans_rcpp,mini_kmeans, pvclust.
An integer with the minimum number of clusters This data is
necessary to indicate the minimum number of clusters when grouping the data.
The default value is 3
.
An integer with the maximum number of clusters. This data is
necessary to indicate the maximum number of clusters when grouping the data.
The default value is 4
.
Character vector with the metrics implemented to evaluate the
distribution of the data in clusters. NULL
The night metrics
implemented are: entropy, variation_information,
precision,recall,f_measure,fowlkes_mallows_index,connectivity,dunn,silhouette.
an boolean which indicates that if we want to show as a
result the attributes of the datasets or the numerical value of the
calculation of the metrics. The default value is F
.
a matrix with the result of running all the metrics of the algorithms contained in the packages we indicated. We also obtain information with the types of metrics, algorithms and packages executed.
result It is a list with the algorithms, metrics and variables defined in the execution of the algorithm.
has_internal_metrics Boolean field to indicate if there are internal metrics such as: dunn, silhoutte and connectivity.
has_external_metrics Boolean field to indicate if there are external metrics such as: precision, recall, f-measure, entropy, variation information and fowlkes-mallows.
algorithms_execute Character vector with the algorithms executed. These algorithms have been mentioned in the definition of the parameters.
measures_execute Character vector with the measures executed. These measures have been mentioned in the definition of the parameters.
This algorithm improves and complements existing implementations of clustering algorithms.
The approaches that exist, are many algorithms that run parallel to the algorithms, without being able to be compared between them. In addition, it was necessary to indicate which variable of the dataset is required to be executed. In addition, depending on the package there are some implementations or others to evaluate the groupings of data, so it is sometimes complicated to compare the groupings between different packages.
With this algorithm we can solve the problems mentioned above and determine which algorithm has the best behavior for the set of attributes as well as the most efficient number of clusters.
The operation of this algorithm is to evaluate how the attributes of
a dataset or a set of datasets behave in different grouping algorithms. To do
this, it is necessary to indicate the type of evaluation you want to make on
the
distribution of the data. To be able to execute the algorithm it is necessary
to indicate the number of clusters
min
and max
, the algorithms algorithm
or packages
packages
that we want to cluster,
the metrics metrics
and if we want that the results of evaluation are
the own classified attributes or numerical values attributes
.