Learn R Programming

ddR (version 0.1.1)

dds: Distributed Data-structures in R

Description

dds simplifies large-scale data analysis. It includes new language constructs to express distributed programs in R. Distributed programs writted in dds can work across multiple execution engines such as parallel, distributedR, and others. dds provides data-structures such as distributed array darray to partition and share data across multiple R instances. Users can express parallel execution using dmapply.

Arguments

Commands

dds contains the following commands. For more details use help function on each command.

Session manangement{

}

Distributed array, data.frame, and list{

  • darray- create distributed array
  • dframe- create distributed data frame
  • dlist- create distributed list
  • as.darray- create darray object from matrix object
  • is.darray- check if object is distributed array
  • parts- obtain partitions of an object
  • nparts- number of partitions as vector
  • totalParts- obtain total number of partitions
  • psize- obtain dimensions of partitions
  • collect- fetch darray, dframe or dlist object at the master
  • repartition- repartition input object
}

Distributed execution{

  • dmapply- execute function on cluster
  • dlapply- execute function on cluster
}

References

  • Prasad, S., Fard, A., Gupta, V., Martinez, J., LeFevre, J., Xu, V., Hsu, M., Roy, I. Large scale predictive analytics in Vertica: Fast data transfer, distributed model creation and in-database prediction (2015). _Sigmod 2015_, 1657-1668.
  • Venkataraman, S., Bodzsar, E., Roy, I., AuYoung, A., and Schreiber, R. (2013) Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices.EuroSys'13, 197--210.
  • Homepage: https://github.com/vertica/DistributedR % \item Mailing list: distributedRTeam@external.groups.hp.com

Examples

Run this code
library(dds)
  useBackend(parallel)
  a <- dmapply(function(x,y) x+y, 1:5, 2:6, nparts=3) 
  collect(a)

Run the code above in your browser using DataLab