Michael Hahsler

Michael Hahsler

17 packages on CRAN

1 packages on Bioconductor

arules

cran
89th

Percentile

Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides interfaces to C implementations of the association mining algorithms Apriori and Eclat by C. Borgelt.

seriation

cran
87th

Percentile

Infrastructure for seriation with an implementation of several seriation/sequencing techniques to reorder matrices, dissimilarity matrices, and dendrograms. Also provides (optimally) reordered heatmaps, color images and clustering visualizations like dissimilarity plots, and visual assessment of cluster tendency plots (VAT and iVAT).

TSP

cran
87th

Percentile

Basic infrastructure and some algorithms for the traveling salesperson problem (also traveling salesman problem; TSP). The package provides some simple algorithms and an interface to the Concorde TSP solver and its implementation of the Chained-Lin-Kernighan heuristic. Concorde itself is not included in the package and has to be obtained separately from http://www.math.uwaterloo.ca/tsp/concorde.html.

qap

cran
86th

Percentile

Implements heuristics for the Quadratic Assignment Problem (QAP). Currently only a simulated annealing heuristic is available.

arulesViz

cran
85th

Percentile

Extends package arules with various visualization techniques for association rules and itemsets. The package also includes several interactive visualizations for rule exploration.

dbscan

cran
56th

Percentile

A fast reimplementation of several density-based algorithms of the DBSCAN family for spatial data. Includes the DBSCAN (density-based spatial clustering of applications with noise) and OPTICS (ordering points to identify the clustering structure) clustering algorithms and the LOF (local outlier factor) algorithm. The implementations uses the kd-tree data structure (from library ANN) for faster k-nearest neighbor search. An R interface to fast kNN and fixed-radius NN search is also provided.

15th

Percentile

Provides a research infrastructure to test and develop recommender algorithms including UBCF, IBCF, FunkSVD and association rule-based algorithms.

NBMiner is an implementation of the model-based mining algorithm for mining NB-frequent itemsets presented in "Michael Hahsler. A model-based frequency constraint for mining associations from transaction data. Data Mining and Knowledge Discovery, 13(2):137-166, September 2006." In addition an extension for NB-precise rules is implemented.

stream

cran

A framework for data stream modeling and associated data mining tasks such as clustering and classification. The development of this package was supported in part by NSF IIS-0948893 and NIH R21HG005912.

Provides the Book-Crossing Dataset for the package recommenderlab.

rEMM

cran

Implements TRACDS (Temporal Relationships between Clusters for Data Streams), a generalization of Extensible Markov Model (EMM). TRACDS adds a temporal or order model to data stream clustering by superimposing a dynamically adapting Markov Chain. Also provides an implementation of EMM (TRACDS on top of tNN data stream clustering). Development of this package was supported in part by NSF IIS-0948893 and R21HG005912 from the National Human Genome Research Institute.

Provides the Jester Dataset for package recommenderlab.

streamMOA

cran

Interface for data stream clustering algorithms implemented in the MOA (Massive Online Analysis) framework.

rRDP

bioconductor

Seamlessly interfaces RDP classifier (version 2.9).

pmml

cran
71th

Percentile

The Predictive Model Markup Language (PMML) is an XML-based language which provides a way for applications to define statistical and data mining models and to share models between PMML compliant applications. More information about PMML and the Data Mining Group can be found at http:// www.dmg.org. The generated PMML can be imported into any PMML consuming application, such as the Zementis ADAPA and UPPI scoring engines which allow for predictive models built in R to be deployed and executed on site, in the cloud (Amazon, IBM, and FICO), in-database (IBM Netezza, Pivotal, Sybase IQ, Teradata and Teradata Aster) or Hadoop (Datameer and Hive).

cba

cran
63th

Percentile

Implements clustering techniques such as Proximus and Rock, utility functions for efficient computation of cross distances and data manipulation.

Add-on for arules to handle and mine frequent sequences. Provides interfaces to the C++ implementation of cSPADE by Mohammed J. Zaki.

arulesCBA

cran

Provides a function to build an association rule-based classifier for data frames, and to classify incoming data frames using such a classifier.