stream v1.3-0

0

Monthly downloads

0th

Percentile

Infrastructure for Data Stream Mining

A framework for data stream modeling and associated data mining tasks such as clustering and classification. The development of this package was supported in part by NSF IIS-0948893 and NIH R21HG005912.

Readme

stream - Infrastructure for Data Stream Mining - R package

CRAN version CRAN RStudio mirror downloads Travis-CI Build Status AppVeyor Build Status

The package provides support for modeling and simulating data streams as well as an extensible framework for implementing, interfacing and experimenting with algorithms for various data stream mining tasks. The main advantage of stream is that it seamlessly integrates with the large existing infrastructure provided by R. The package currently focuses on data stream clustering and provides implementations of BICO, BIRCH, D-Stream and DBSTREAM.

Additional packages in the stream family are:

  • streamMOA: Interface to clustering algorithms implemented in the MOA framework. Includes implementations of DenStream, ClusTree and CluStream.
  • subspaceMOA: Interface to Subspace MOA and its implementations of HDDStream and PreDeConStream.

The development of the stream package was supported in part by NSF IIS-0948893 and NIH R21HG005912.

Installation

Stable CRAN version: install from within R with

install.packages("stream")

Current development version: Download package from AppVeyor or install from GitHub (needs devtools).

install_git("mhahsler/stream")

Usage

Load the package and create micro-clusters via sampling.

library("stream")
stream <- DSD_Gaussians(k=3, noise=0)

sample <- DSC_Sample(k=20)
update(sample, stream, 500)
sample
Reservoir sampling
Class: DSC_Sample, DSC_Micro, DSC_R, DSC 
Number of micro-clusters: 20

Recluster micro-clusters using k-means and plot results

kmeans <- DSC_Kmeans(k=3)
recluster(kmeans, sample)
plot(kmeans, stream, type="both")

References

Functions in stream

Name Description
DSC_Static Create as Static Copy of a Clustering
DSD_ReadDB Read a Data Stream from an open DB Query
DSD_Cubes Static Cubes Data Stream Generator
DSD_ScaleStream Scale a Stream from a DSD
DSD_Benchmark Data Stream Generator for Benchmark Data
DSC_TwoStage TwoStage Clustering Process
DSD_Target Target Data Stream Generator
DSC_Reachability Reachability Micro-Cluster Reclusterer
DSD_BarsAndGaussians Data Stream Generator for Bars and Gaussians
DSC_Sample Extract a Fixed-size Sample from a Data Stream
DSD_MG DSD Moving Generator
DSO_Sample Sampling from a Data Stream (Data Stream Operator)
get_centers Get Cluster Centers from a DSC
DSD_UniformNoise Uniform Noise Data Stream Generator
animation Animates the plotting of a DSD and the clustering process
get_assignment Assignment Data Points to Clusters
DSD_Gaussians Mixture of Gaussians Data Stream Generator
nclusters nclusters
prune_clusters Prune Clusters from a Clustering
plot Plotting Data Stream Data and Clusterings
evaluate Evaluate Clusterings
DSC_Window A sliding window from a Data Stream
DSO_Window Sliding Window (Data Stream Operator)
reset_stream Reset a Data Stream to its Beginning
DSClassify Abstract Class for Data Stream Classifiers
DSD_mlbenchData Stream Interface for Data Sets From mlbench
save Save and Read DSC Objects
recluster Re-clustering micro-clusters
DSD_mlbenchGenerator mlbench Data Stream Generator
DST Abstract Base Class for All Data Stream Mining Tasks
DSFP Abstract Class for Frequent Pattern Mining Algorithms for Data Streams
DSO Data Stream Operator Base Classes
MGC Moving Generator Cluster
DSD_Memory A Data Stream Interface for Data Stored in Memory
DSD_ReadCSV Read a Data Stream from File
get_weights Get Cluster Weights
get_copy Create a Deep Copy of a DSC Object
microToMacro Translate Micro-cluster IDs to Macro-cluster IDs
write_stream Write a Data Stream to a File
update Update a Data Stream Clustering Model
get_points Get Points from a Data Stream Generator
DSC Data Stream Clusterer Base Classes
DSC_BIRCH Balanced Iterative Reducing Clustering using Hierarchies
DSC_BICO BICO - Fast computation of k-means coresets in a data stream
DSC_DBSCAN DBSCAN Macro-clusterer
DSC_Kmeans Kmeans Macro-clusterer
DSC_DStream D-Stream Data Stream Clustering Algorithm
DSC_Mirco Abstract Class for Micro Clusterers
DSC_Hierarchical Hierarchical Micro-Cluster Reclusterer
DSD Data Stream Data Generator Base Classes
DSC_DBSTREAM DBSTREAM clustering algorithm
DSC_Marco Abstract Class for Macro Clusterers
No Results!

Vignettes of stream

Name
architecture.odg
architecture.pdf
classes.pdf
dsd_uml.odg
dsd_uml.pdf
dst_uml.odg
dst_uml.pdf
eval.pdf
interaction.odg
interaction.pdf
mcs.pdf
stream.Rnw
stream.bib
stream_extension.Rnw
time.pdf
No Results!

Last month downloads

Details

Date 2018-05-31
URL http://lyle.smu.edu/IDA/TRACDS/
BugReports https://github.com/mhahsler/stream
LinkingTo Rcpp, BH
License GPL-3
RoxygenNote 6.0.1
NeedsCompilation yes
Packaged 2018-06-02 04:23:27 UTC; hahsler
Repository CRAN
Date/Publication 2018-06-02 05:07:55 UTC

Include our badge in your README

[![Rdoc](http://www.rdocumentation.org/badges/version/stream)](http://www.rdocumentation.org/packages/stream)