save(), so that objects can be manually
picked out of or added to the track database if needed. The
track database is a directory usually named rdatadir that
contains a RData file for each object and several housekeeping files
that are either plain text or RData files.
Tracking works by replacing a tracked variable by an
activeBinding, which when accessed looks up information in an
associated 'tracking environment' and reads or writes the corresponding
RData file and/or gets or assigns the variable in the tracking
environment. In the default mode of operation, R variables that are
accessed are stored in memory for the duration of the top level task
(i.e., in one expression evaluated from the prompt.) A callback that is
called each time a top-level-task completes does three major things:
.Rincr_history at the end of each top-level task, along with a
time stamp that does not appear in the interactive history. The standard
history functionality (savehistory/loadhistory) in R writes the history
only at the end of the session. Thus, if the R session terminates
abnormally, history is lost.track.start() need be made to start automatically tracking the global
environment. If it is desired to save untrackable variables at the end
of the session, track.stop() should be called before calling
save.image() or q('yes'), because track.stop() will
ensure that tracked variables are saved to disk and then remove them
from the global environment, leaving save.image() to save only
the untracked or untrackable variables. The basic functions used in
automatic tracking are as follows:
track.start(dir=...): start tracking
the global environment, with files saved indir(the default isrdatadir).track.summary(): print a summary of
the basic characteristics of tracked variables: name, class, extent,
and creation, modification and access times.track.info(): print a summary of
which tracking databases are currently active.track.stop(pos=, all=): stop tracking.
Any unsaved tracked variables are saved to disk. UnlesskeepVars=TRUEis supplied, all tracked variables
become unavailable until tracking starts again.track.attach(dir=..., pos=): attach an existing
tracking database to the search list at the specified position. The
default when attaching at positions other than 1 is to use readonly
mode, but in non-readonly mode, changes to variables in the attached
environment will be automatically saved to the database.track.rescan(pos=): rescan a tracking directory
that was attached bytrack.attach()at a position other than 1,
and that is preferably readonly.track.start(dir=..., auto=TRUE/FALSE): start tracking
the global environment, with files saved indirtrack(x): start trackingx-xin the global environment is replaced by an active binding
andxis saved in its corresponding file in the tracking
directory and, if caching is on, in the tracking environmenttrack(x <- value): start trackingxtrack(list=c('x', 'y')): start tracking
specified variablestrack(all=TRUE): start tracking all
untracked variables in the global environmentuntrack(x): stop tracking variablex-
the R objectxis put back as an ordinary object in the global environmentuntrack(all=TRUE): stop tracking all
variables in the global environment (but tracking is still set up)untrack(list=...): stop tracking specified variablestrack.remove(x): completely remove all
traces ofxfrom the global environment, tracking environment
and tracking directory. Note that if variablexin the global
environment is tracked,remove(x)will makexan "orphaned" variable:remove(x)will just remove the active binding from the global
environment, and leavexin the tracked environment and on
file, andxwill reappear after restarting tracking.track package provides many additional functions for
controlling how tracking is performed (e.g., whether or not tracked variables
are cached in memory), examining the state of tracking (show which
variables are tracked, untracked, orphaned, masked, etc.) and repairing
tracking environments and databases that have become inconsistent or incomplete
(this may result from resource limitiations, e.g., being unable to
write a save file due to lack of disk space, or from manual tinkering,
e.g., dropping a new save file into a tracking directory.)
The functions that can be used to set up and take down tracking are:
track.start(dir=...): start tracking,
using the supplied directorytrack.stop(): stop tracking
(any unsaved tracked variables are saved to disk and all tracked variables
become unavailable until tracking starts again)track.dir(): return the path of the
tracking directorytrack(x)track(var <- value)track(list=...)track(all=TRUE): start tracking variable(s)track.load(file=...): load some objects from
a RData file into the tracked environmentuntrack(x, keep.in.db=FALSE)untrack(list=...)untrack(all=TRUE): stop tracking variable(s) -
value is left in place, and optionally, it is also left in the the databasetrack.summary(): return a data
frame containing a summary of the basic characteristics of tracked
variables: name, class, extent, and creation, modification and access times.track.status(): return a data frame
containing information about the tracking status of variables: whether
they are saved to disk or not, etc.track.info(): return a data frame
containing information about which tracking dbs are currently active.track.mem(): return a data frame
containing information about number of objects and memory usage in
tracking dbs.env.is.tracked(): tell whether an
environment is currently trackedtracked(): return the names of tracked variablesuntracked(): return the names of
untracked variablesuntrackable(): return the names of
variables that cannot be trackedtrack.unsaved(): return the names of
variables whose copy on file is out-of-datetrack.orphaned(): return the
names of once-tracked variables that have lost their active binding
(should not happen)track.masked(): return the names of
once-tracked variables whose active binding has been overwritten by an
ordinary variable (should not happen)track.options(): examine and set
options to control trackingtrack.load(): load variables from a
saved RData file into the tracking sessiontrack.copy()andtrack.move(): copy
or move variables from one tracking db to anothertrack.rename()rename variables in a tracking dbtrack.rescan(): reload variable
values from disk (can forget all cached vars, remove no-longer existing tracked vars)track.auto(): turn auto-tracking on or offtrack.sync(): write unsaved variables to disk, and
remove excess objects from memory. This function can be called by the
user if they wish to remove excess objects from memory during a
memory-intensive top-level command.track.sync.callback(): callstrack.sync(),
this function is installed as a
task callback (to be called each time a top-level task is completed,
seetaskCallback). This function is
not exported by the track package.track.auto.monitor(): an additional callback that
monitors the existence of the callback totrack.sync.callbackand re-instates it if missing. This function is
not exported by the track package.track.remove(): completely remove all
traces of a tracked variabletrack.save(): write unsaved variables to disktrack.flush(): write unsaved variables to disk, and remove from memorytrack.forget(): delete cached
versions without saving to file (the object saved in the file
will be retrieved next time the variable is accessed)track.rebuild(): rebuild tracking
information from objects in memory or on disktrack.designtrack package:
save(),load(), and/orsave.image().track.options(cache=TRUE, cachePolicy="none"), or start tracking with
track.start(..., cache=TRUE, cachePolicy="none"). A possible future
improvement is to allow conditional and/or more intelligent caching of
objects. Some data that would be needed for this is already collected
in access counts and times that are recorded in the tracking summary.
Here is a brief example of tracking some variables in the global environment:
> library(track)
> # By default, track.start() uses/creates a db in the dir
> # 'rdatadir' in the current working directory; supply arg
> # dir= to change.
> track.start()
> x <- 123 # Variable 'x' is now tracked
> y <- matrix(1:6, ncol=2) # 'y' is assigned & tracked
> z1 <- list("a", "b", "c")
> z2 <- Sys.time()
> track.summary(size=F) # See a summary of tracked vars
class mode extent length modified TA TW
x numeric numeric [1] 1 2007-09-07 08:50:58 0 1
y matrix numeric [3x2] 6 2007-09-07 08:50:58 0 1
z1 list list [[3]] 3 2007-09-07 08:50:58 0 1
z2 POSIXt,POSIXct numeric [1] 1 2007-09-07 08:50:58 0 1
> # (TA="total accesses", TW="total writes")
> ls(all=TRUE)
[1] "x" "y" "z1" "z2"
> track.stop(pos=1) # Stop tracking
> ls(all=TRUE)
character(0)
>
> # Restart using the tracking dir -- the variables reappear
> track.start() # Start using the same tracking dir again ("rdatadir")
> ls(all=TRUE)
[1] "x" "y" "z1" "z2"
> track.summary(size=F)
class mode extent length modified TA TW
x numeric numeric [1] 1 2007-09-07 08:50:58 0 1
y matrix numeric [3x2] 6 2007-09-07 08:50:58 0 1
z1 list list [[3]] 3 2007-09-07 08:50:58 0 1
z2 POSIXt,POSIXct numeric [1] 1 2007-09-07 08:50:58 0 1
> track.stop(pos=1)
>
> # the files in the tracking directory:
> list.files("rdatadir", all=TRUE)
[1] "." ".."
[3] "filemap.txt" ".trackingSummary.rda"
[5] "x.rda" "y.rda"
[7] "z1.rda" "z2.rda"
>
There are several points to note:
auto=FALSEtotrack.start(), or by
callingtrack.auto(FALSE).save()/load()(RData files).track package.
Potential future features of the track package.
Documentation for save and load (in 'base' package).
Documentation for makeActiveBinding and related
functions (in 'base' package).
Inspriation from the packages g.data and
filehash.
Description of the facility
(addTaskCallback) for adding a
callback function that is called at the end of each top-level task (each
time R returns to the prompt after completing a command):
##############################################################
# Warning: running this example will cause variables currently
# in the R global environment to be written to .RData files
# in a tracking database on the filesystem under R's temporary
# directory, and will cause the variables to be removed from
# the R global environment.
# It is recommended to run this example with a fresh R session
# with no important variables in the global environment.
##############################################################
library(track)
# Start tracking the global environment using a tmp directory
# Default tracking db dir is 'rdatadir' in the current working
# directory; omit the dir= argument to use this.
if (!is.element('tmpenv', search())) attach(new.env(), name='tmpenv', pos=2)
assign('tmpdatadir', pos='tmpenv', value=file.path(tempdir(), 'rdatadir1'))
track.start(dir=tmpdatadir)
a <- 1
b <- 2
ls()
track.status()
track.summary()
track.info()
track.stop()
# Variables are now gone because default action of track.stop()
# is to not read all tracked variables into memory (this could
# exhaust memory and/or be very time consuming).
ls()
# bring them back
track.start(dir=tmpdatadir)
ls()
# It is possible to keep tracked vars after stopping tracking:
track.stop(keepVars=TRUE)
ls()Run the code above in your browser using DataLab