Learn R Programming

⚠️There's a newer version (0.8.6.1) of this package.Take me there.

datadr (version 0.8.4)

Divide and Recombine for Large, Complex Data

Description

Methods for dividing data into subsets, applying analytical methods to the subsets, and recombining the results. Comes with a generic MapReduce interface as well. Works with key-value pairs stored in memory, on local disk, or on HDFS, in the latter case using the R and Hadoop Integrated Programming Environment (RHIPE).

Copy Link

Version

Install

install.packages('datadr')

Monthly Downloads

22

Version

0.8.4

License

BSD_3_clause + file LICENSE

Maintainer

Ryan Hafen

Last Published

March 14th, 2016

Functions in datadr (0.8.4)

ddf-accessors

Accessor methods for 'ddf' objects
localDiskConn

Connect to Data Source on Local Disk
getSplitVar

Extract "Split" Variable(s)
addData

Add Key-Value Pairs to a Data Connection
drQuantile

Sample Quantiles for 'ddf' Objects
combCollect

"Collect" Recombination
drAggregate

Division-Agnostic Aggregation
combRbind

"rbind" Recombination
readHDFStextFile

Experimental HDFS text reader helper function
drFilter

Filter a 'ddo' or 'ddf' Object
removeData

Remove Key-Value Pairs from a Data Connection
print.ddo

Print a "ddo" or "ddf" Object
divide

Divide a Distributed Data Object
addTransform

Add a Transformation Function to a Distributed Data Object
ddo-ddf-accessors

Accessor Functions
divide-internals

Functions used in divide()
setupTransformEnv

Set up transformation environment
makeExtractable

Take a ddo/ddf HDFS data object and turn it into a mapfile
mrExec

Execute a MapReduce Job
hdfsConn

Connect to Data Source on HDFS
ddo-ddf-attributes

Managing attributes of 'ddo' or 'ddf' objects
drGetGlobals

Get Global Variables and Package Dependencies
adult

"Census Income" Dataset
bsv

Construct Between Subset Variable (BSV)
ddf

Instantiate a Distributed Data Frame ('ddf')
datadr-package

datadr
recombine

Recombine
drLapply

Apply a function to all key-value pairs of a ddo/ddf object
drSubset

Subsetting Distributed Data Frames
condDiv

Conditioning Variable Division
applyTransform

Apply transformation function(s)
combDdo

"DDO" Recombination
combMeanCoef

Mean Coefficient Recombination
updateAttributes

Update Attributes of a 'ddo' or 'ddf' Object
print.kvValue

Print value of a key-value pair
as.data.frame.ddf

Turn 'ddf' Object into Data Frame
combDdf

"DDF" Recombination
print.kvPair

Print a key-value pair
drJoin

Join Data Sources by Key
getCondCuts

Get names of the conditioning variable cuts
charFileHash

Character File Hash Function
kvPair

Specify a Key-Value Pair
drGLM

GLM Transformation Method
ddo

Instantiate a Distributed Data Object ('ddo')
drSample

Take a Sample of Key-Value Pairs Take a sample of key-value Pairs
drRead.table

Data Input
mr-summary-stats

Functions to Compute Summary Statistics in MapReduce
drLM

LM Transformation Method
kvApply

Apply Function to Key-Value Pair
drBLB

Bag of Little Bootstraps Transformation Method
kvPairs

Specify a Collection of Key-Value Pairs
drPersist

Persist a Transformed 'ddo' or 'ddf' Object
localDiskControl

Specify Control Parameters for MapReduce on a Local Disk Connection
readTextFileByChunk

Experimental sequential text reader helper function
rrDiv

Random Replicate Division
%>%

Pipe data
to_ddf

Convert dplyr grouped_df to ddf
rhipeControl

Specify Control Parameters for RHIPE Job
as.list.ddo

Turn 'ddo' / 'ddf' Object into a list
convert

Convert 'ddo' / 'ddf' Objects
combMean

Mean Recombination
digestFileHash

Digest File Hash Function
flatten

"Flatten" a ddf Subset
drHexbin

HexBin Aggregation for Distributed Data Frames