Instantiate a distributed data frame ('ddf')
ddf(conn, transFn = NULL, update = FALSE, reset = FALSE, control = NULL,
verbose = TRUE)
an object pointing to where data is or will be stored for the 'ddf' object - can be a 'kvConnection' object created from localDiskConn
or hdfsConn
, or a data frame or list of key-value pairs
transFn a function to be applied to the key-value pairs of this data prior to doing any processing, that transform the data into a data frame if it is not stored as such
should the attributes of this object be updated? See updateAttributes
for more details.
should all persistent metadata about this object be removed and the object created from scratch? This setting does not effect data stored in the connection location.
parameters specifying how the backend should handle things if attributes are updated (most-likely parameters to rhwatch
in RHIPE) - see rhipeControl
and localDiskControl
logical - print messages about what is being done
# NOT RUN {
# in-memory ddf
d <- ddf(iris)
d
# local disk ddf
conn <- localDiskConn(tempfile(), autoYes = TRUE)
addData(conn, list(list("1", iris[1:10,])))
addData(conn, list(list("2", iris[11:110,])))
addData(conn, list(list("3", iris[111:150,])))
dl <- ddf(conn)
dl
# hdfs ddf (requires RHIPE / Hadoop)
# }
# NOT RUN {
# connect to empty HDFS directory
conn <- hdfsConn("/tmp/irisSplit")
# add some data
addData(conn, list(list("1", iris[1:10,])))
addData(conn, list(list("2", iris[11:110,])))
addData(conn, list(list("3", iris[111:150,])))
# represent it as a distributed data frame
hdd <- ddf(conn)
# }
Run the code above in your browser using DataLab