Unlimited learning, half price | 50% off

Last chance! 50% off unlimited learning

Sale ends in


⚠️There's a newer version (2.0-3) of this package.Take me there.

stream - Infrastructure for Data Stream Mining - R package

The package provides support for modeling and simulating data streams as well as an extensible framework for implementing, interfacing and experimenting with algorithms for various data stream mining tasks. The main advantage of stream is that it seamlessly integrates with the large existing infrastructure provided by R. The package currently focuses on data stream clustering and provides implementations of BICO, BIRCH, D-Stream, DBSTREAM, and evoStream.

Additional packages in the stream family are:

  • streamMOA: Interface to clustering algorithms implemented in the MOA framework. Includes implementations of DenStream, ClusTree and CluStream.

The development of the stream package was supported in part by NSF IIS-0948893 and NIH R21HG005912.

Installation

Stable CRAN version: install from within R with

install.packages("stream")

Current development version: Download package from AppVeyor or install from GitHub (needs devtools).

install_git("mhahsler/stream")

Usage

Load the package and create micro-clusters via sampling.

library("stream")
stream <- DSD_Gaussians(k=3, noise=0)

sample <- DSC_Sample(k=20)
update(sample, stream, 500)
sample
Reservoir sampling
Class: DSC_Sample, DSC_Micro, DSC_R, DSC
Number of micro-clusters: 20

Recluster micro-clusters using k-means and plot results

kmeans <- DSC_Kmeans(k=3)
recluster(kmeans, sample)
plot(kmeans, stream, type="both")

A list of all available clustering methods can be obtained with

DSC_registry$get_entries()

References

Copy Link

Version

Install

install.packages('stream')

Monthly Downloads

1,097

Version

1.5-0

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

September 7th, 2021

Functions in stream (1.5-0)

DSC_BICO

BICO - Fast computation of k-means coresets in a data stream
DSC_DStream

D-Stream Data Stream Clustering Algorithm
DSC_DBSCAN

DBSCAN Macro-clusterer
DSC_Marco

Abstract Class for Macro Clusterers
DSC_EA

Evolutionary Algorithm
DSC_DBSTREAM

DBSTREAM clustering algorithm
DSC_BIRCH

Balanced Iterative Reducing Clustering using Hierarchies
DSC

Data Stream Clusterer Base Classes
DSC_Kmeans

Kmeans Macro-clusterer
DSC_evoStream

evoStream - Evolutionary Stream Clustering
DSC_Hierarchical

Hierarchical Micro-Cluster Reclusterer
DSClassify

Abstract Class for Data Stream Classifiers
DSC_Sample

Extract a Fixed-size Sample from a Data Stream
DSD_MG

DSD Moving Generator
DSD_Gaussians

Mixture of Gaussians Data Stream Generator
DSC_Reachability

Reachability Micro-Cluster Reclusterer
DSD_ReadDB

Read a Data Stream from an open DB Query
DSC_Window

A sliding window from a Data Stream
DSD_ScaleStream

Scale a Stream from a DSD
DSD_Cubes

Static Cubes Data Stream Generator
DSD_Benchmark

Data Stream Generator for Benchmark Data
DSC_TwoStage

TwoStage Clustering Process
DSD

Data Stream Data Generator Base Classes
DSC_Static

Create as Static Copy of a Clustering
DSC_SinglePass-class

Abstract Class for Single-Pass Clusterers
DSD_mlbenchGenerator

mlbench Data Stream Generator
DSD_BarsAndGaussians

Data Stream Generator for Bars and Gaussians
DSD_mlbenchData

Stream Interface for Data Sets From mlbench
DST

Abstract Base Class for All Data Stream Mining Tasks
DefaultEvalCallback-class

Default Class for Evaluation Callbacks
MGC

Moving Generator Cluster
EvalCallback-class

Abstract Class for Evaluation Callbacks
DSD_Memory

A Data Stream Interface for Data Stored in Memory
DSD_ReadCSV

Read a Data Stream from File
save

Save and Read DSC Objects
DSO_Window

Sliding Window (Data Stream Operator)
DSO_Sample

Sampling from a Data Stream (Data Stream Operator)
plot

Plotting Data Stream Data and Clusterings
get_weights

Get Cluster Weights
get_points

Get Points from a Data Stream Generator
DSC_Micro

Abstract Class for Micro Clusterers
clean_outliers

Clean Outliers from the Outlier Detecting Clusterer
recluster

Re-clustering micro-clusters
animation

Animates the plotting of a DSD and the clustering process
reset_stream

Reset a Data Stream to its Beginning
DSFP

Abstract Class for Frequent Pattern Mining Algorithms for Data Streams
DSC_Outlier-class

Abstract Class for Outlier Detection Clusterers
DSO

Data Stream Operator Base Classes
prune_clusters

Prune Clusters from a Clustering
DSD_UniformNoise

Uniform Noise Data Stream Generator
DSD_Target

Target Data Stream Generator
evaluate

Evaluate Clusterings
get_assignment

Assignment Data Points to Clusters
write_stream

Write a Data Stream to a File
update

Update a Data Stream Clustering Model
microToMacro

Translate Micro-cluster IDs to Macro-cluster IDs
get_centers

Get Cluster Centers from a DSC
nclusters

nclusters
get_copy

Create a Deep Copy of a DSC Object