Attention
This repository is unstable and currently experimental. Please come back later when we have a new version to correspond with dat 1.0. Keep up to date in #dat on freenode or @dat_project on Twitter.
rdat
Software is in alpha stage. Not yet ready for use with real world data
The rdat
package provides an R wrapper to the Dat project. Dat (git
for data) is a framework for data versioning, replication and synchronisation, see dat-data.com.
Installation instructions
Prerequisites: Instructions below require R, git and nodejs (npm).
Installing dat
stable
Install the latest stable version from npm:
sudo npm install -g dat
See instructions for more details.
Installing dat
development version
If you have not already installed dat
grab it from github:
git clone https://github.com/maxogden/dat ~/dat
cd ~/dat
npm install .
sudo npm link
To update an existing copy of dat
cd ~/dat
git pull
rm -Rf node_modules
npm install .
Installing rdat
Then install the R package:
library(devtools)
install_github("ropensci/rdat")
Run through the examples to verify that everything works:
library(rdat)
example(dat)
API
This api is experimental and hasn't been finalized or implemented. Stay tuned for updates
init
When no remote
is specified, dat()
will init a new repository:
repo <- dat("cars", path = getwd())
insert
Inserts data from a data frame and gets the dat version key
# insert some data
repo$insert(cars[1:20,])
v1 <- repo$status()$version
v1
Inserts more data, get a new version key
# insert more data
repo$insert(cars[21:25,])
v2 <- repo$status()$version
v2
get
Retreive particular versions of the dataset from the key.
data1 <- repo$get(v1)
data2 <- repo$get(v2)
diff
List changes in between versions
diff <- repo$diff(v1, v2)
diff$key
branching
Fork a dataset from a particular version into a new branch.
# create fork
repo$checkout(v1)
repo$insert(cars[40:42,])
repo$forks()
v3 <- repo$status()$version
checkout
Checkout the data at a particular version.
# go back to v2
repo$checkout(v2)
repo$get()
binary data
Save binary data (files) as attachements to the dataset.
# store binary attachements
repo$write(serialize(iris, NULL), "iris")
unserialize(repo$read("iris"))
clone
# Create another repo
dir.create(newdir <- tempfile())
repo2 <- dat("cars", path = newdir, remote = repo$path())
repo2$forks()
repo2$get()
Specifying a remote
(path or url) to clone an existing repo. In this case we clone the previous repo into a new location.
push and pull
Lets make yet another clone of our original repository
# Create a third repo
dir.create(newdir <- tempfile())
repo3 <- dat("cars", path = newdir, remote = repo$path())
Add data in repo2 and then push
it back to repo1.
# Add some data and push to origin
repo2$insert(cars[31:40,])
repo2$push()
Then pull
data back into repo3.
# sync data with origin
repo3$pull()
# Verify that repositories are in sync
mydata2 <- repo2$get()
mydata3 <- repo3$get()
all.equal(mydata2, mydata3)