The scclust package is an R wrapper for the scclust library.
The package provides functions to construct near-optimal size-constrained
clusterings. Subject to user-specified constraints on the size and composition
of the clusters, scclust constructs a clustering so that within-cluster
pair-wise distances are minimized.
The main clustering function is sc_clustering. Statistics about
clusters can be derived with the get_clustering_stats
function. To check if a clustering satisfies some set of
constraints, use check_clustering. Use scclust to
construct a scclust object from an existing clustering.
Clusters can also be constructed with hierarchical_clustering.
However, this function does not support type constraints and does not provide
optimality guarantees. Its main use is to refine clusterings constructed with
the sc_clustering function.
scclust was made with large data sets in mind, and it can cluster tens
of millions of data points within minutes on an ordinary desktop computer.
See the package's website for more information: https://github.com/fsavje/scclust-R.
More information about the scclust library is found here:
https://github.com/fsavje/scclust.
Bug reports and suggestions are greatly appreciated. They are best reported here: https://github.com/fsavje/scclust-R/issues.
Higgins, Michael J., Fredrik Sävje and Jasjeet S. Sekhon (2016), ‘Improving massive experiments with threshold blocking’, Proceedings of the National Academy of Sciences, 113:27, 7369--7376.
Sävje, Fredrik and Michael J. Higgins and Jasjeet S. Sekhon (2017), ‘Generalized Full Matching’, arXiv 1703.03882. https://arxiv.org/abs/1703.03882