Function ffindexorder
will calculate chunkwise the order positions to sort all positions in a chunk ascending.
Function ffindexordersize
does the calculation of the chunksize for ffindexorder
.
ffindexordersize(length, vmode, BATCHBYTES = getOption("ffmaxbytes"))
ffindexorder(index, BATCHSIZE, FF_RETURN = NULL, VERBOSE = FALSE)
Function ffindexorder
returns an ff integer vector with an attribute BATCHSIZE
(the chunksize finally used, not the one given with argument BATCHSIZE
).
Function ffindexordersize
returns a balanced batchsize as returned from bbatch
.
A ff
integer vector with integer subscripts.
Limit for the chunksize (see details)
Limit for the number of bytes per batch
Optionally an ff
integer vector in which the chunkwise order positions are stored.
Logical scalar for activating verbosing.
Number of elements in the index
The vmode
of the ff vector to which the index shall be applied with ffindexget
or ffindexset
Jens Oehlschlägel
Accessing integer positions in an ff vector is a non-trivial task, because it could easily lead to random-access to a disk file.
We avoid random access by loading batches of the subscript values into RAM, order them ascending, and only then access the ff values on disk.
Such an ordering can be done on-the-fly by ffindexget
or it can be created upfront with ffindexorder
, stored and re-used,
similar to storing and using hybrid index information with as.hi
.
ffindexget
, as.hi
, bbatch
x <- ff(sample(40))
message("fforder requires sorting")
i <- fforder(x)
message("applying this order i is done by ffindexget")
x[i]
message("applying this order i requires random access,
therefore ffindexget does chunkwise sorting")
ffindexget(x, i)
message("if we want to apply the order i multiple times,
we can do the chunkwise sorting once and store it")
s <- ffindexordersize(length(i), vmode(i), BATCHBYTES = 100)
o <- ffindexorder(i, s$b)
message("this is how the stored chunkwise sorting is used")
ffindexget(x, i, o)
message("")
rm(x,i,s,o)
gc()
Run the code above in your browser using DataLab