datadr (version 0.8.6.1)

ddf: Instantiate a Distributed Data Frame ('ddf')

Description

Instantiate a distributed data frame ('ddf')

Usage

ddf(conn, transFn = NULL, update = FALSE, reset = FALSE, control = NULL,
  verbose = TRUE)

Arguments

conn

an object pointing to where data is or will be stored for the 'ddf' object - can be a 'kvConnection' object created from localDiskConn or hdfsConn, or a data frame or list of key-value pairs

transFn

transFn a function to be applied to the key-value pairs of this data prior to doing any processing, that transform the data into a data frame if it is not stored as such

update

should the attributes of this object be updated? See updateAttributes for more details.

reset

should all persistent metadata about this object be removed and the object created from scratch? This setting does not effect data stored in the connection location.

control

parameters specifying how the backend should handle things if attributes are updated (most-likely parameters to rhwatch in RHIPE) - see rhipeControl and localDiskControl

verbose

logical - print messages about what is being done

Examples

Run this code
# NOT RUN {
# in-memory ddf
d <- ddf(iris)
d

# local disk ddf
conn <- localDiskConn(tempfile(), autoYes = TRUE)
addData(conn, list(list("1", iris[1:10,])))
addData(conn, list(list("2", iris[11:110,])))
addData(conn, list(list("3", iris[111:150,])))
dl <- ddf(conn)
dl

# hdfs ddf (requires RHIPE / Hadoop)
# }
# NOT RUN {
  # connect to empty HDFS directory
  conn <- hdfsConn("/tmp/irisSplit")
  # add some data
  addData(conn, list(list("1", iris[1:10,])))
  addData(conn, list(list("2", iris[11:110,])))
  addData(conn, list(list("3", iris[111:150,])))
  # represent it as a distributed data frame
  hdd <- ddf(conn)
# }

Run the code above in your browser using DataLab