
Last chance! 50% off unlimited learning
Sale ends in
This class provides a data stream interface for data stored in memory as matrix-like objects (including data frames). All or a portion of the stored data can be replayed several times.
DSD_Memory(x, n, k=NA, loop=FALSE,
class = NULL, description=NULL)
A matrix-like object containing the data.
If x
is a DSD object then a data frame for n
data points
from this DSD is created.
Number of points used if x
is a DSD object. If
x
is a matrix-like object then n
is ignored.
Optional: The known number of clusters in the data
Should the stream start over when it reaches the end?
Vector with the class/cluster label (only used if x
is not
a DSD object).
character string with a description.
Returns a DSD_Memory
object
(subclass of DSD_R
, DSD
).
In addition to regular data.frames other matrix-like objects that provide subsetting with the bracket operator can be used.
This includes ffdf
(large data.frames
stored on disk) from package ff
and big.matrix
from bigmemory.
# NOT RUN {
# store 1000 points from a stream
stream <- DSD_Gaussians(k=3, d=2)
replayer <- DSD_Memory(stream, k=3, n=1000)
replayer
plot(replayer)
# creating 2 clusterers of different algorithms
dsc1 <- DSC_DBSTREAM(r=0.1)
dsc2 <- DSC_DStream(gridsize=0.1, Cm=1.5)
# clustering the same data in 2 DSC objects
reset_stream(replayer) # resetting the replayer to the first position
update(dsc1, replayer, 500)
reset_stream(replayer)
update(dsc2, replayer, 500)
# plot the resulting clusterings
reset_stream(replayer)
plot(dsc1, replayer, main="DBSTREAM")
reset_stream(replayer)
plot(dsc2, replayer, main="D-Stream")
### use a data.frame to create a stream (3rd col. contains the assignment)
df <- data.frame(x=runif(100), y=runif(100),
class=sample(1:3, 100, replace=TRUE))
head(df)
stream <- DSD_Memory(df[,c("x", "y")], class=df[,"class"])
stream
# }
Run the code above in your browser using DataLab