Learn R Programming

⚠️There's a newer version (0.8.6.1) of this package.Take me there.

datadr: Divide and Recombine in R

datadr is an R package that leverages RHIPE to provide a simple interface to division and recombination (D&R) methods for large complex data.

To get started, see the package documentation and function reference located here.

Visualization tools based on D&R can be found here.

Installation

# from CRAN:
install.packages("datadr")

# from github:
devtools::install_github("delta-rho/datadr")

License

This software is currently under the BSD license. Please read the license document.

Acknowledgement

datadr development is sponsored by:

  • U.S. Department of Defense Advanced Research Projects Agency, XDATA program
  • U.S. Department of Homeland Security, Science and Technology Directorate, Homeland Security Advanced Research Projects Agency (HSARPA)
  • Pacific Northwest National Laboratory, operated by Battelle for the U.S. Department of Energy, LDRD Program, Signature Discovery and Future Power Grid Initiatives

Copy Link

Version

Install

install.packages('datadr')

Monthly Downloads

30

Version

0.8.6

License

BSD_3_clause + file LICENSE

Maintainer

Ryan Hafen

Last Published

October 2nd, 2016

Functions in datadr (0.8.6)

adult

"Census Income" Dataset
drBLB

Bag of Little Bootstraps Transformation Method
drFilter

Filter a 'ddo' or 'ddf' Object
drLM

LM Transformation Method
drLapply

Apply a function to all key-value pairs of a ddo/ddf object
readHDFStextFile

Experimental HDFS text reader helper function
readTextFileByChunk

Experimental sequential text reader helper function
applyTransform

Apply transformation function(s)
combRbind

"rbind" Recombination
combMeanCoef

Mean Coefficient Recombination
addData

Add Key-Value Pairs to a Data Connection
addTransform

Add a Transformation Function to a Distributed Data Object
ddf

Instantiate a Distributed Data Frame ('ddf')
ddo-ddf-accessors

Accessor Functions
drHexbin

HexBin Aggregation for Distributed Data Frames
getCondCuts

Get names of the conditioning variable cuts
drJoin

Join Data Sources by Key
hdfsConn

Connect to Data Source on HDFS
combDdf

"DDF" Recombination
combCollect

"Collect" Recombination
combMean

Mean Recombination
combDdo

"DDO" Recombination
datadr-package

datadr
ddf-accessors

Accessor methods for 'ddf' objects
drQuantile

Sample Quantiles for 'ddf' Objects
drPersist

Persist a Transformed 'ddo' or 'ddf' Object
print.kvPair

Print a key-value pair
digestFileHash

Digest File Hash Function
divide-internals

Functions used in divide()
drRead.table

Data Input
drSample

Take a Sample of Key-Value Pairs Take a sample of key-value Pairs
localDiskConn

Connect to Data Source on Local Disk
kvPairs

Specify a Collection of Key-Value Pairs
removeData

Remove Key-Value Pairs from a Data Connection
recombine

Recombine
print.kvValue

Print value of a key-value pair
to_ddf

Convert dplyr grouped_df to ddf
updateAttributes

Update Attributes of a 'ddo' or 'ddf' Object
divide

Divide a Distributed Data Object
drAggregate

Division-Agnostic Aggregation
as.list.ddo

Turn 'ddo' / 'ddf' Object into a list
as.data.frame.ddf

Turn 'ddf' Object into Data Frame
condDiv

Conditioning Variable Division
convert

Convert 'ddo' / 'ddf' Objects
drGetGlobals

Get Global Variables and Package Dependencies
drGLM

GLM Transformation Method
kvApply

Apply Function to Key-Value Pair
kvPair

Specify a Key-Value Pair
rhipeControl

Specify Control Parameters for RHIPE Job
rrDiv

Random Replicate Division
bsv

Construct Between Subset Variable (BSV)
charFileHash

Character File Hash Function
%>%

Pipe data
print.ddo

Print a "ddo" or "ddf" Object
setupTransformEnv

Set up transformation environment
getSplitVar

Extract "Split" Variable(s)
drSubset

Subsetting Distributed Data Frames
flatten

"Flatten" a ddf Subset
localDiskControl

Specify Control Parameters for MapReduce on a Local Disk Connection
makeExtractable

Take a ddo/ddf HDFS data object and turn it into a mapfile
ddo-ddf-attributes

Managing attributes of 'ddo' or 'ddf' objects
ddo

Instantiate a Distributed Data Object ('ddo')
mrExec

Execute a MapReduce Job
mr-summary-stats

Functions to Compute Summary Statistics in MapReduce
%>%

Pipe data