Learn R Programming

stream (version 1.1-1)

DSD_ReadCSV: Read a Data Stream from File

Description

A DSD class that reads a data stream from a file or any R connection.

Usage

DSD_ReadCSV(file, sep=",", k=NA, d=NA, take=NULL, 
  class=NULL, loop=FALSE)
close_stream(dsd)

Arguments

file
A file/URL or an open connection.
sep
The character string that separates dimensions in data points in the stream.
k
Number of true clusters, if known.
d
Number of dimensions (only used for print).
take
indices of columns to extract from the file.
class
column index for the class attribute/cluster label.
loop
If enabled, the object will loop through the stream when the end has been reached. If disabled, the object will warn the user upon reaching the end.
dsd
A object of class DSD_ReadCSV.

Value

  • An object of class DSD_ReadCSV (subclass of DSD_R, DSD).

Details

DSD_ReadCSV uses read.table() to read in data from an R connection. The connection is responsible for maintaining where the stream is currently being read from. In general, the connections will consist of files stored on disk but have many other possibilities (see connection).

The position in the file can be reset to the beginning using reset_stream(). The connection can be closed using close_stream().

See Also

DSD, reset_stream,

Examples

Run this code
# creating data and writing it to disk
stream <- DSD_Gaussians(k=3, d=5)
write_stream(stream, "data.txt", n=100, sep=",")

# reading the same data back (as a loop)
stream2 <- DSD_ReadCSV("data.txt", sep=",", loop=TRUE)
stream2

# clean up
close_stream(stream2)
file.remove("data.txt")

# example with a part of the kddcup1999 data (take only cont. variables)
file <- system.file("examples", "kddcup10000.data.gz", package="stream")
stream <- DSD_ReadCSV(gzfile(file),
        take=c(1, 5, 6, 8:11, 13:20, 23:41), class=42, k=7)
stream

get_points(stream,5)


# plot 100 points (projected on the first two principal components)
plot(stream, n=100, method="pc")

close_stream(stream)

Run the code above in your browser using DataLab