refdata: subsettable reference to matrix or data.frame

Description

Function refdata creates objects of class refdata which behave not totally unlike matrices or data.frames but allow for much more memory efficient handling.

Usage

# -- usage for R CMD CHECK, see below for human readable version -----------
refdata(x)
derefdata(x)
derefdata(x) <- value
 ## S3 method for class 'refdata':
[(x, i = NULL, j = NULL, drop = FALSE, ref = FALSE)
 ## S3 method for class 'refdata':
[(x, i = NULL, j = NULL, ref = FALSE) <- value
 ## S3 method for class 'refdata':
dim(x)
 ## S3 method for class 'refdata':
dimnames(x)
 ## S3 method for class 'refdata':
row.names(x)
 ## S3 method for class 'refdata':
names(x)

# -- most important usage for human beings  --------------------------------
# rd <- refdata(x)                   # create reference
# derefdata(rd)                      # retrieve original data
# derefdata(rd) <- value             # modify original data
# rd[]                               # get all (current) data
# rd[i, j]                           # get part of data
# rd[i, j, ref=TRUE]                 # get new reference on part of data
# rd[i, j]           <- value        # modify part of data (now rd is reference on local copy of the data)
# rd[i, j, ref=TRUE] <- value        # modify part of original data (respecting subsetting history)
# dim(rd)                            # dim of (subsetted) data
# dimnames(rd)                       # dimnames of (subsetted) data

Arguments

a matrix or data.frame or any other 2-dimensional object that has operators "[" and "[<-" defined

row index

col index

ref

FALSE by default. In subsetting: FALSE returns data, TRUE returns new refdata object. In assignments: FALSE modifies a local copy and returns a refdata object embedding it, TRUE modifies the original.

drop

FALSE by default, i.e. returned data have always a dimension attribute. TRUE drops dimension in some cases, the exact result depends on whether a matrix or data.frame

value

some value to be assigned

Value

an object of class refdata (appended to class attributes of data), which is an empty list with two attributes
datthe environment where the data x and its dimension dim is stored
indthe environment where the indexes i, j and the effective subset size ni, nj is stored

Details

Refdata objects store 2D-data in one environment and index information in another environment. Derived refdata objects usually share the data environment but not the index environment. The index information is stored in a standardized and memory efficient form generated by optimal.index. Thus refdata objects can be copied and subsetted and even modified without duplicating the data in memory. Empty square bracket subsetting (rd[]) returns the data, square bracket subsetting (rd[i, j]) returns subsets of the data as expected. An additional argument (rd[i, j, ref=TRUE]) allows to get a reference that stores the subsetting indices. Such a reference behaves transparently as if a smaller matrix/data.frame would be stored and can be subsetted again recursively. With ref=TRUE indices are always interpreted as row/col indices, i.e. x[i] and x[cbind(i, j)] are undefined (and raise stop errors) Standard square bracket assignment (rd[i, j] <- value) creates a reference to a locally modified copy of the (potentially subsetted) data. An additional argument (rd[i, j, ref=TRUE] <- value) allows to modify the original data, properly recognizing the subsetting history. A method dim(refdata) returns the dim of the (indexed) data. A dimnames(refdata) returns the dimnames of the (indexed) data.

Examples

Run this code

## Simple usage Example
  x <- cbind(1:5, 5:1)            # take a matrix or data frame
  rx <- refdata(x)                # wrap it into an refdata object
  rx                              # see the autoprinting
  rm(x)                           # delete original to save memory
  rx[]                            # extract all data
  rx[-1, ]                        # extract part of data
  rx2 <- rx[-1, , ref=TRUE]       # create refdata object referencing part of data (only index, no data is duplicated)
  rx2                             # compare autoprinting
  rx2[]                           # extract 'all' data
  rx2[-1, ]                       # extract part of (part of) data
  cat("for more examples look the help pages
")

 # Memory saving demos
  square.matrix.size <- 1000
  recursion.depth.limit <- 10
  non.referenced.matrix <- matrix(1:(square.matrix.size*square.matrix.size), nrow=square.matrix.size, ncol=square.matrix.size)
  rownames(non.referenced.matrix) <- paste("a", seq(length=square.matrix.size), sep="")
  colnames(non.referenced.matrix) <- paste("b", seq(length=square.matrix.size), sep="")
  referenced.matrix <- refdata(non.referenced.matrix)
  recurse.nonref <- function(m, depth.limit=10){
    x <- m[1,1]   # need read access here to create local copy
    gc()
    cat("depth.limit=", depth.limit, "  memory.size=", memsize.wrapper(), "\n", sep="")
    if (depth.limit)
      Recall(m[-1, -1, drop=FALSE], depth.limit=depth.limit-1)
    invisible()
  }
  recurse.ref <- function(m, depth.limit=10){
    x <- m[1,1]   # read access, otherwise nothing happens
    gc()
    cat("depth.limit=", depth.limit, "  memory.size=",  memsize.wrapper(), "\n", sep="")
    if (depth.limit)
      Recall(m[-1, -1, ref=TRUE], depth.limit=depth.limit-1)
    invisible()
  }
  gc()
  memsize.wrapper()
  recurse.ref(referenced.matrix, recursion.depth.limit)
  gc()
   memsize.wrapper()
  recurse.nonref(non.referenced.matrix, recursion.depth.limit)
  gc()
   memsize.wrapper()
  rm(recurse.nonref, recurse.ref, non.referenced.matrix, referenced.matrix, square.matrix.size, recursion.depth.limit)
  cat("for even more examples look at regression.test.refdata()
")
  regression.test.refdata()  # testing correctness of refdata functionality

Run the code above in your browser using DataLab