Michael Hahsler

Michael Hahsler

17 packages on CRAN

1 packages on Bioconductor



Infrastructure for seriation with an implementation of several seriation/sequencing techniques to reorder matrices, dissimilarity matrices, and dendrograms. Also provides (optimally) reordered heatmaps, color images and clustering visualizations like dissimilarity plots, and visual assessment of cluster tendency plots (VAT and iVAT).



Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides C implementations of the association mining algorithms Apriori and Eclat.



Basic infrastructure and some algorithms for the traveling salesperson problem (also traveling salesman problem; TSP). The package provides some simple algorithms and an interface to the Concorde TSP solver and its implementation of the Chained-Lin-Kernighan heuristic. Concorde itself is not included in the package and has to be obtained separately from http://www.math.uwaterloo.ca/tsp/concorde.html.



Implements heuristics for the Quadratic Assignment Problem (QAP). Currently only a simulated annealing heuristic is available.



Extends package arules with various visualization techniques for association rules and itemsets. The package also includes several interactive visualizations for rule exploration.



A fast reimplementation of several density-based algorithms of the DBSCAN family for spatial data. Includes the DBSCAN (density-based spatial clustering of applications with noise) and OPTICS (ordering points to identify the clustering structure) clustering algorithms and the LOF (local outlier factor) algorithm. The implementations uses the kd-tree data structure (from library ANN) for faster k-nearest neighbor search. An R interface to fast kNN and fixed-radius NN search is also provided.

Provides a research infrastructure to test and develop recommender algorithms including UBCF, IBCF, FunkSVD and association rule-based algorithms.

NBMiner is an implementation of the model-based mining algorithm for mining NB-frequent itemsets presented in "Michael Hahsler. A model-based frequency constraint for mining associations from transaction data. Data Mining and Knowledge Discovery, 13(2):137-166, September 2006." In addition an extension for NB-precise rules is implemented.



A framework for data stream modeling and associated data mining tasks such as clustering and classification. The development of this package was supported in part by NSF IIS-0948893 and NIH R21HG005912.



Interface for data stream clustering algorithms implemented in the MOA (Massive Online Analysis) framework.



Implements TRACDS (Temporal Relationships between Clusters for Data Streams), a generalization of Extensible Markov Model (EMM). TRACDS adds a temporal or order model to data stream clustering by superimposing a dynamically adapting Markov Chain. Also provides an implementation of EMM (TRACDS on top of tNN data stream clustering). Development of this package was supported in part by NSF IIS-0948893 and R21HG005912 from the National Human Genome Research Institute.

Provides the Book-Crossing Dataset for the package recommenderlab.

Provides the Jester Dataset for package recommenderlab.



Seamlessly interfaces RDP classifier (version 2.9).



The Predictive Model Markup Language (PMML) is an XML-based language which provides a way for applications to define statistical and data mining models and to share models between PMML compliant applications. More information about PMML and the Data Mining Group can be found at <http:// www.dmg.org>. The generated PMML can be imported into any PMML consuming application, such as the Zementis ADAPA and UPPI scoring engines which allow for predictive models built in R to be deployed and executed on site, in the cloud (Amazon, IBM, and FICO), in-database (IBM Netezza, Pivotal, Sybase IQ, Teradata and Teradata Aster) or Hadoop (Datameer and Hive).



Implements clustering techniques such as Proximus and Rock, utility functions for efficient computation of cross distances and data manipulation.



Provides a function to build an association rule-based classifier for data frames, and to classify incoming data frames using such a classifier.

Add-on for arules to handle and mine frequent sequences. Provides interfaces to the C++ implementation of cSPADE by Mohammed J. Zaki.