datadr (version 0.8.4)

ddf: Instantiate a Distributed Data Frame ('ddf')

Description

Instantiate a distributed data frame ('ddf')

Usage

ddf(conn, transFn = NULL, update = FALSE, reset = FALSE, control = NULL,
  verbose = TRUE)

Arguments

conn
an object pointing to where data is or will be stored for the 'ddf' object - can be a 'kvConnection' object created from localDiskConn or hdfsConn, or
transFn
transFn a function to be applied to the key-value pairs of this data prior to doing any processing, that transform the data into a data frame if it is not stored as such
update
should the attributes of this object be updated? See updateAttributes for more details.
reset
should all persistent metadata about this object be removed and the object created from scratch? This setting does not effect data stored in the connection location.
control
parameters specifying how the backend should handle things if attributes are updated (most-likely parameters to rhwatch in RHIPE) - see rhipeControl and
verbose
logical - print messages about what is being done

Examples

Run this code
# in-memory ddf
d <- ddf(iris)
d

# local disk ddf
conn <- localDiskConn(tempfile(), autoYes = TRUE)
addData(conn, list(list("1", iris[1:10,])))
addData(conn, list(list("2", iris[11:110,])))
addData(conn, list(list("3", iris[111:150,])))
dl <- ddf(conn)
dl

# hdfs ddf (requires RHIPE / Hadoop)
# connect to empty HDFS directory
  conn <- hdfsConn("/tmp/irisSplit")
  # add some data
  addData(conn, list(list("1", iris[1:10,])))
  addData(conn, list(list("2", iris[11:110,])))
  addData(conn, list(list("3", iris[111:150,])))
  # represent it as a distributed data frame
  hdd <- ddf(conn)

Run the code above in your browser using DataCamp Workspace