archivist (version 1.0)

saveToRepo: Save an Artifact into a Repository

Description

saveToRepo function saves desired artifacts to the local Repository in a given directory. To learn more about artifacts visit archivist-package.

Usage

saveToRepo(artifact, repoDir, archiveData = TRUE, archiveTags = TRUE,
  archiveMiniature = TRUE, force = TRUE, rememberName = TRUE, ...)

Arguments

artifact
An arbitrary R artifact to be saved. For supported artifacts see details.
...
Graphical parameters denoting width and height of a miniature. See details.
archiveData
A logical value denoting whether to archive the data from the artifact.
archiveTags
A logical value denoting whether to archive tags from the artifact.
archiveMiniature
A logical value denoting whether to archive a miniature of the artifact.
repoDir
A character denoting an existing directory in which an artifact will be saved.
force
A logical value denoting whether to archive artifact if it was already archived in a Repository.
rememberName
A logical value. Should not be changed by an user. It is a technical parameter.

Value

  • As a result of this function a character string is returned, which determines the md5hash of the artifact that was used in the saveToRepo function. If archiveData was TRUE, the result also has an attribute, named data, that determines md5hash of the data needed to compute the artifact.

Details

saveToRepo function saves desired artifacts to the local Repository in a given directory. Artifacts are saved in the local Repository, which is a SQLite database named backpack. After every saveToRepo call the database is refreshed, so the artifact is available immediately in the database for other collaborators. Every artifact is archived in a md5hash.rda file. This file will be saved in a folder (under repoDir directory) named gallery. For every artifact, md5hash is a unique string of length 32 that comes out as a result of digest function, which uses a cryptographical MD5 hash algorithm.

By default, a miniature of an artifact and (if possible) a data set needed to compute this artifact are extracted. They are also going to be saved in a file named by their md5hash in the gallery folder that exists in the directory specified in the repoDir argument. Moreover, a specific Tag-relation is going to be added to the backpack dataset in case there is a need to load the artifact with it's related data set - see loadFromLocalRepo or loadFromGithubRepo. Default settings may be changed by using the archiveData, archiveTag or archiveMiniature arguments with the FALSE value.

Tags are artifact's attributes, different for various artifact's classes. For more detailed information check Tags

Archived artifact can be searched in the backpack dataset by using the searchInLocalRepo or searchInGithubRepo functions. Artifacts can be searched by their Tags, names, classes or archiving date.

Graphical parameters.

If the artifact is of class data.frame or archiveData = TRUE, it is possible to specify how many rows of that data should be archived by adding the argument firstRows with the n specified number of rows. Note that, the date can be extracted only from the artifacts that are supported by the archivist package; see Tags.

If the artifact is of class lattice or ggplot, and archiveMiniature = TRUE, then it is possible to set the miniature's width and height parameters. By default they are set to width = 800, height = 600.

Supported artifact's classes are (so far):

  • lm,
  • data.frame,
  • ggplot,
  • htest,
  • trellis,
  • twins (result of agnes, diana or mona function),
  • partition (result of pam, clara or fanny fuction),
  • lda,
  • qda,
  • glmnet,
  • survfit.

To check what Tags will be extracted for various artifacts see Tags.

See Also

For more detailed information check archivist package vignette.

The list of supported artifacts and their tags is available on wiki on archivist https://github.com/pbiecek/archivist/wiki/archivist-package---Tags{Github Repository}.

Other archivist: Repository; Tags; archivist-package; copyGithubRepo, copyLocalRepo; createEmptyRepo; deleteRepo; loadFromGithubRepo, loadFromLocalRepo; md5hash; rmFromRepo; searchInGithubRepo, searchInLocalRepo; showGithubRepo, showLocalRepo; summaryGithubRepo, summaryLocalRepo

Examples

Run this code
# objects preparation
# data.frame object
data(iris)

# ggplot/gg object
library(ggplot2)
df <- data.frame(gp = factor(rep(letters[1:3], each = 10)),y = rnorm(30))
library(plyr)
ds <- ddply(df, .(gp), summarise, mean = mean(y), sd = sd(y))
myplot123 <- ggplot(df, aes(x = gp, y = y)) +
  geom_point() +  geom_point(data = ds, aes(y = mean),
               colour = 'red', size = 3)

# lm object
model <- lm(Sepal.Length~ Sepal.Width + Petal.Length + Petal.Width, data= iris)

# agnes (twins) object
library(cluster)
data(votes.repub)
agn1 <- agnes(votes.repub, metric = "manhattan", stand = TRUE)

# fanny (partition) object
x <- rbind(cbind(rnorm(10, 0, 0.5), rnorm(10, 0, 0.5)),
         cbind(rnorm(15, 5, 0.5), rnorm(15, 5, 0.5)),
          cbind(rnorm( 3,3.2,0.5), rnorm( 3,3.2,0.5)))
fannyx <- fanny(x, 2)

# lda object
library(MASS)

Iris <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]),
                  Sp = rep(c("s","c","v"), rep(50,3)))
train <- c(8,83,115,118,146,82,76,9,70,139,85,59,78,143,68,
           134,148,12,141,101,144,114,41,95,61,128,2,42,37,
           29,77,20,44,98,74,32,27,11,49,52,111,55,48,33,38,
           113,126,24,104,3,66,81,31,39,26,123,18,108,73,50,
           56,54,65,135,84,112,131,60,102,14,120,117,53,138,5)
lda1 <- lda(Sp ~ ., Iris, prior = c(1,1,1)/3, subset = train)

# qda object
tr <- c(7,38,47,43,20,37,44,22,46,49,50,19,4,32,12,29,27,34,2,1,17,13,3,35,36)
train <- rbind(iris3[tr,,1], iris3[tr,,2], iris3[tr,,3])
cl <- factor(c(rep("s",25), rep("c",25), rep("v",25)))
qda1 <- qda(train, cl)

# glmnet object
library( glmnet )

zk=matrix(rnorm(100*20),100,20)
bk=rnorm(100)
glmnet1=glmnet(zk,bk)


# creating example Repository - that examples will work

# save examples

exampleRepoDir <- tempdir()
createEmptyRepo( repoDir = exampleRepoDir )
saveToRepo( myplot123, repoDir=exampleRepoDir )
saveToRepo( iris, repoDir=exampleRepoDir )
saveToRepo( model, repoDir=exampleRepoDir )
saveToRepo( agn1, repoDir=exampleRepoDir )
saveToRepo( fannyx, repoDir=exampleRepoDir )
saveToRepo( lda1, repoDir=exampleRepoDir )
saveToRepo( glmnet1, repoDir=exampleRepoDir )

# let's see how the Repository look like: show

showLocalRepo( method = "md5hashes", repoDir = exampleRepoDir )
showLocalRepo( method = "tags", repoDir = exampleRepoDir )

# let's see how the Repository look like: summary

summaryLocalRepo( exampleRepoDir )

# one can archived the same artifact twice, but there is a message

saveToRepo( model, repoDir=exampleRepoDir )

# in case not to archive the same artifact twice, use

saveToRepo( lda1, repoDir=exampleRepoDir, force = FALSE )

# one can archive artifact withouth it's database and miniature

saveToRepo( qda1, repoDir=exampleRepoDir, archiveData = FALSE,
            archiveMiniature = FALSE)

# one can specify his own additional tags to be archived with artifact

attr( model, "tags" ) = c( "do not delete", "my favourite model" )
saveToRepo( model, repoDir=exampleRepoDir )
showLocalRepo( "tags", exampleRepoDir )

# removing an example Repository

deleteRepo( exampleRepoDir )

rm( exampleRepoDir )

Run the code above in your browser using DataCamp Workspace