stream (version 1.2-3)

DSD_Memory: A Data Stream Interface for Data Stored in Memory

Description

This class provides a data stream interface for data stored in memory as matrix-like objects (including data frames). All or a portion of the stored data can be replayed several times.

Usage

DSD_Memory(x, n, k=NA, loop=FALSE, class = NULL, description=NULL)

Arguments

x
A matrix-like object containing the data. If x is a DSD object then a data frame for n data points from this DSD is created.
n
Number of points used if x is a DSD object. If x is a matrix-like object then n is ignored.
k
Optional: The known number of clusters in the data
loop
Should the stream start over when it reaches the end?
class
Vector with the class/cluster label (only used if x is not a DSD object).
description
character string with a description.

Value

Returns a DSD_Memory object (subclass of DSD_R, DSD).

Details

In addition to regular data.frames other matrix-like objects that provide subsetting with the bracket operator can be used. This includes ffdf (large data.frames stored on disk) from package ff and big.matrix from bigmemory.

See Also

DSD, reset_stream

Examples

Run this code
# store 1000 points from a stream
stream <- DSD_Gaussians(k=3, d=2)
replayer <- DSD_Memory(stream, k=3, n=1000)
replayer
plot(replayer)  
  
# creating 2 clusterers of different algorithms
dsc1 <- DSC_DBSTREAM(r=0.1)
dsc2 <- DSC_DStream(gridsize=0.1, Cm=1.5)
  
# clustering the same data in 2 DSC objects
reset_stream(replayer) # resetting the replayer to the first position
update(dsc1, replayer, 500)
reset_stream(replayer)
update(dsc2, replayer, 500)
  
# plot the resulting clusterings
reset_stream(replayer) 
plot(dsc1, replayer, main="DBSTREAM")
reset_stream(replayer) 
plot(dsc2, replayer, main="D-Stream")   
  
### use a data.frame to create a stream (3rd col. contains the assignment)
df <- data.frame(x=runif(100), y=runif(100), 
  class=sample(1:3, 100, replace=TRUE))
head(df)  

stream <- DSD_Memory(df[,c("x", "y")], class=df[,"class"])  
stream

Run the code above in your browser using DataLab