ReferenceClasses: Objects With Fields Treated by Reference (OOP-style)

Description

The software described here supports reference classes whose objects have fields accessed by reference in the style of OOP languages such as Java and C++. Computations with these objects invoke methods on them and extract or set their fields. The field and method computations potentially modify the object. All computations referring to the objects see the modifications, in contrast to the usual functional programming model in R. Reference classes can be used to program in Rdirectly or in combination with an interface to an OOP-style language, allowing R-written methods to extend the interface.

Usage

setRefClass(Class, fields = , contains = , methods =,
     where =, inheritPackage =, ...)
getRefClass(Class, where =)

Arguments

Class

character string name for the class.

In the call to getRefClass() this argument can also be any object from the relevant class; note also the corresponding reference class methods documented in the section on Writing Reference Methods.

fields

either a character vector of field names or a named list of the fields. The resulting fields will be accessed with reference semantics (see the section on Reference Objects). If the argument is a list, the elements of the list can be the character string name of a class, in which case the field must be from that class or a subclass.

The element in the list can alternatively be an accessor function, a function of one argument that returns the field if called with no argument or sets it to the value of the argument otherwise. Accessor functions are used internally and for inter-system interface applications. Their definition follows the rules for writing methods for the class: they can refer to other fields and can call other methods for this class or its superclasses. See the section on Implementation for the internal mechanism used by accessor functions.

Note that fields are distinct from the slots, if any, in the object. Slots are, as always, handled by standard R

Value

setRefClass() returns a generator function suitable for creating objects from the class, invisibly. A call to this function takes any number of arguments, which will be passed on to the initialize method. If no initialize method is defined for the class or one of its superclasses, the default method expects named arguments with the name of one of the fields and unnamed arguments, if any, that are objects from one of the superclasses of this class (but only superclasses that are themselves reference classes have any effect).
The generator function is similar to the S4 generator function returned by setClass. In addition to being a generator function, however, it is also a reference class generator object, with reference class methods for various utilities. See the section on reference class generator objects below.
If the class has a method defined for $initialize(),this method will be called once the reference object has been created. You should write such a method for a class that needs to do some special initialization. In particular, a reference method is recommended rather than a method for the S4 generic function initialize(), because some special initialization is required for reference objects before the initialization of fields. As with S4 classes, methods are written for $initialize()and not for $new(),both for the previous reason and also because $new()is invoked on the generator object and would be a method for that class.
The default method for $initialize()is equivalent to invoking the method $initFields(...).Named arguments assign initial values to the corresponding fields. Unnamed arguments must be objects from this class or a reference superclass of this class. Fields will be initialized to the contents of the fields in such objects, but named arguments override the corresponding inherited fields. Note that fields are simply assigned. If the field is itself a reference object, that object is not copied. The new and previous object will share the reference. Also, a field assigned from an unnamed argument counts as an assignment for locked fields. To override an inherited value for a locked field, the new value must be one of the named arguments in the initializing call. A later assignment of the field will result in an error.
Initialization methods need some care in design. The generator for a reference class will be called with no arguments, for example when copying the object. To ensure that these calls do not fail, the method must have defaults for all arguments or check for missing(). The method should include ... as an argument and pass this on via $callSuper() (or $initFields() if you know that your superclasses have no initialization methods). This allows future class definitions that subclass this class, with additional fields.
getRefClass() also returns the generator function for the class. Note that the package slot in the value is the correct package from the class definition, regardless of the where argument, which is used only to find the class if necessary.

item

contains
methods
where
inheritPackage
...

code

trace(what, edit = TRUE)

dQuote

Writing Reference Methods
Reference Object Generators
Inter-Package Superclasses and External Methods

link

registered

Reference Objects

Normal objects in Rare passed as arguments in function calls consistently with functional programming semantics; that is, changes made to an object passed as an argument are local to the function call. The object that supplied the argument is unchanged.

The functional model (sometimes called pass-by-value, although this is inaccurate for R) is suitable for many statistical computations and is implicit, for example, in the basic Rsoftware for fitting statistical models. In some other situations, one would like all the code dealing with an object to see the exact same content, so that changes made in any computation would be reflected everywhere. This is often suitable if the object has some objective reality, such as a window in a user interface.

In addition, commonly used languages, including Java, C++ and many others, support a version of classes and methods assuming reference semantics. The corresponding programming mechanism is to invoke a method on an object. In the Rsyntax we use "$"for this operation; one invokes a method, m1 say, on an object x by the expression x$m1(...). Methods in this paradigm are associated with the object, or more precisely with the class of the object, as opposed to methods in a function-based class/method system, which are fundamentally associated with the function (in R, for example, a generic function in an Rsession has a table of all its currently known methods). In this document methods for a class as opposed to methods for a function will make the distinction.

Objects in this paradigm usually have named fields on which the methods operate. In the Rimplementation, the fields are defined when the class is created. The field itself can optionally have a specified class, meaning that only objects from this class or one of its subclasses can be assigned to the field. By default, fields have class "ANY". Fields may also be defined by supplying an accessor function which will be called to get or set the field. Accessor functions are likely when reference classes are part of an inter-system interface. The interface will usually supply the accessor functions automatically based on the definition of the corresponding class in the other language.

Fields are accessed by reference. In particular, invoking a method may modify the content of the fields.

Programming for such classes involves writing new methods for a particular class. In the Rimplementation, these methods are Rfunctions, with zero or more formal arguments. The object on which the methods are invoked is not an explicit argument to the method. Instead, fields and methods for the class can be referred to by name in the method definition. The implementation uses Renvironments to make fields and methods available by name. Additional special fields allow reference to the complete object and to the definition of the class. See the section on Writing Reference Methods.

The goal of the software described here is to provide a uniform programming style in Rfor software dealing with reference classes, whether implemented directly in Ror through an interface to one of the OOP languages.

Writing Reference Methods

Reference methods are functions supplied as elements of a named list, either when invoking g$methods()on a generator object g or as the argument methods in a call to setRefClass. They are written as ordinary Rfunctions but have some special features and restrictions. The body of the function can contain calls to any other reference method, including those inherited from other reference classes and may refer to fields in the object by name.

Alternatively, a method may be an external method, a function whose first argument is .self. The body of such methods works like any ordinary function. The methods are called like other methods (without the .self argument, which is supplied internally and always refers to the object itself). External methods exist so that reference classes can inherit the package environment of superclasses in other packages; see the Section Inter-Package Superclasses and External Methods.

Fields may be modified in a method by using the non-local assignment operator, <<-< code="">, as in the $edit and $undo methods in the example below. Note that non-local assignment is required: a local assignment with the <- operator just creates a local object in the function call, as it would in any Rfunction. When methods are installed, a heuristic check is made for local assignments to field names and a warning issued if any are detected.

Reference methods should be kept simple; if they need to do some specialized Rcomputation, that computation should use a separate Rfunction that is called from the reference method. Specifically, methods can not use special features of the enclosing environment mechanism, since the method's environment is used to access fields and other methods. In particular, methods should not use non-exported entries in the package's namespace, because the methods may be inherited by a reference class in another package.

Methods for $initialize()have special requirements. See the comments in the Value section.

Reference methods can not themselves be generic functions; if you want additional function-based method dispatch, write a separate generic function and call that from the method.

The entire object can be referred to in a method by the reserved name .self, as shown in the save= method of the example. The special object .refClassDef contains the definition of the class of the object. These fields are read-only (it makes no sense to modify these references), with one exception. In principal, the .self field can be modified in the $initializemethod, because the object is still being created at this stage. This is definitely not recommended, unless to set some non-reference properties of the object defined for this class, which is itself not recommended if it mixes slots and fields.

The methods available include methods inherited from superclasses, as discussed in the next section.

Only methods actually used will be included in the environment corresponding to an individual object. To declare that a method requires a particular other method, the first method should include a call to $usingMethods()with the name of the other method as an argument. Declaring the methods this way is essential if the other method is used indirectly (e.g., via sapply() or do.call()). If it is called directly, code analysis will find it. Declaring the method is harmless in any case, however, and may aid readability of the source code.

Documentation for the methods can be obtained by the $helpmethod for the generator object. Methods for classes are not documented in the Rd format used for Rfunctions. Instead, the $helpmethod prints the calling sequence of the method, followed by self-documentation from the method definition, in the style of Python. If the first element of the body of the method is a literal character string (possibly multi-line), that string is interpreted as documentation. See the method definitions in the example.

Inter-Package Superclasses and External Methods

The environment of a method in a reference class is the object itself, as an environment. This allows the method to refer directly to fields and other methods, without using the whole object and the "$"operator. The parent of that environment is the namespace of the package in which the reference class is defined. Computations in the method have access to all the objects in the package's namespace, exported or not.

When defining a class that contains a reference superclass in another package, there is an ambiguity about which package namespace should have that role. The argument inheritPackage to setRefClass() controls whether the environment of new objects should inherit from an inherited class in another package or continue to inherit from the current package's namespace.

If the superclass is lean, with few methods, or exists primarily to support a family of subclasses, then it may be better to continue to use the new package's environment. On the other hand, if the superclass was originally written as a standalone, this choice may invalidate existing superclass methods. For the superclass methods to continue to work, they must use only exported functions in their package and the new package must import these.

Either way, some methods may need to be written that do not assume the standard model for reference class methods, but behave essentially as ordinary functions would in dealing with reference class objects.

The mechanism is to recognize external methods. An external method is written as a function in which the first argument, named .self, stands for the reference class object. This function is supplied as the definition for a reference class method. The method will be called, automatically, with the first argument being the current object and the other arguments, if any, passed along from the actual call.

Since an external method is an ordinary function in the source code for its package, it has access to all the objects in the namespace. Fields and methods in the reference class must be referred to in the form .self$name. If for some reason you do not want to use .self as the first argument, a function f() can be converted explicitly as externalRefMethod(f), which returns an object of class "externalRefMethod" that can be supplied as a method for the class. The first argument will still correspond to the whole object.

External methods can be supplied for any reference class, but there is no obvious advantage unless they are needed. They are more work to write, harder to read and (slightly) slower to execute.

NOTE: If you are the author of a package whose reference classes are likely to be subclassed in other packages, you can avoid these questions entirely by writing methods that only use exported functions from your package, so that all the methods will work from another package that imports yours.

Implementation

Reference classes are implemented as S4 classes with a data part of type "environment". Fields correspond to named objects in the environment. A field associated with a function is implemented as an active binding. In particular, fields with a specified class are implemented as a special form of active binding to enforce valid assignment to the field. A field, say data, can be accessed generally by an expression of the form x$datafor any object from the relevant class. In a method for this class, the field can be accessed by the name data. A field that is not locked can be set by an expression of the form x$data <- value.Inside a method, a field can be assigned by an expression of the form

x <<- value<="" code="">.
Note the non-local assignment operator.
The standard Rinterpretation of this operator works to assign it in
the environment of the object.
If the field has an accessor function defined, getting and setting
will call that function.
When a method is invoked on an object, the function defining the method is
installed in the object's environment, with the same environment as the
environment of the function.
Because of the implementation, new reference classes can inherit from
non-reference S4 classes as well as reference classes.
This is usually a bad idea, if the slots from the non-reference
class are thought of as alternatives to fields.
Unless there is some special argument in favor, mixing the functional
and reference paradigms for properties of the same object is
conceptually unclear.
In addition, the initialization method for the class will have to sort
out fields from slots, with a good chance of creating anomalous
behavior for subclasses of this class.
Better in general to define fields analogous to the slots in the S4
class, and to initialize those from an S4 object of that class.

Inter-System Interfaces

A number of languages use a similar reference-based programming model with classes and class-based methods. Aside from differences in choice of terminology and other details, many of these languages are compatible with the programming style described here. Rinterfaces to the languages exist in a number of packages.

The reference class definitions here provide a hook for classes in the foreign language to be exposed in R. Access to fields and/or methods in the class can be implemented by defining an Rreference class corresponding to classes made available through the interface. Typically, the inter-system interface will take care of the details of creating the Rclass, given a description of the foreign class (what fields and methods it has, the classes for the fields, whether any are read-only, etc.) The specifics for the fields and methods can be implemented via reference methods for the Rclass. In particular, the use of active bindings allows field access for getting and setting, with actual access handled by the inter-system interface.

Rmethods and/or fields can be included in the class definition as for any reference class. The methods can use or set fields and can call other methods transparently whether the field or method comes from the interface or is defined directly in R.

For an inter-system interface using this approach, see the code for package Rcpp, version 0.8.7 or later.

Debugging

The standard R

Examples

Run this code

## a simple editor for matrix objects.  Method  $edit() changes some
## range of values; method $undo() undoes the last edit.
mEdit <- setRefClass("mEdit",
      fields = list( data = "matrix",
        edits = "list"),
      methods = list(
     edit = function(i, j, value) {
       ## the following string documents the edit method
       'Replaces the range [i, j] of the
        object by value.
        '
         backup <-
             list(i, j, data[i,j])
         data[i,j] <<- value
         edits <<- c(edits, list(backup))
         invisible(value)
     },
     undo = function() {
       'Undoes the last edit() operation
        and update the edits field accordingly.
        '
         prev <- edits
         if(length(prev)) prev <- prev[[length(prev)]]
         else stop("No more edits to undo")
         edit(prev[[1]], prev[[2]], prev[[3]])
         ## trim the edits list
         length(edits) <<- length(edits) - 2
         invisible(prev)
     },
     show = function() {
       'Method for automatically printing matrix editors'
       cat("Reference matrix editor object of class",
          classLabel(class(.self)), "")
       cat("Data: 
")
       methods::show(data)
       cat("Undo list is of length", length(edits), "")
     }
     ))

xMat <- matrix(1:12,4,3)
xx <- mEdit(data = xMat)
xx$edit(2, 2, 0)
xx
xx$undo()
mEdit$help("undo")
stopifnot(all.equal(xx$data, xMat))

utils::str(xx) # show fields and names of non-trivial methods

## add a method to save the object
mEdit$methods(
     save = function(file) {
       'Save the current object on the file
        in R external object format.
       '
         base::save(.self, file = file)
     }
)

tf <- tempfile()
xx$save(tf)
load(tf)
unlink(tf)
stopifnot(identical(xx$data, .self$data))

## Inheriting a reference class:  a matrix viewer
mv <- setRefClass("matrixViewer",
    fields = c("viewerDevice", "viewerFile"),
    contains = "mEdit",
    methods = list( view = function() {
        dd <- dev.cur(); dev.set(viewerDevice)
        devAskNewPage(FALSE)
        matplot(data, main = paste("After",length(edits),"edits"))
        dev.set(dd)},
        edit = # invoke previous method, then replot
          function(i, j, value) {
            callSuper(i, j, value)
            view()
          }))

## initialize and finalize methods
mv$methods( initialize =
  function(file = "./matrixView.pdf", ...) {
    viewerFile <<- file
    pdf(viewerFile)
    viewerDevice <<- dev.cur()
    dev.set(dev.prev())
    callSuper(...)
  },
  finalize = function() {
    dev.off(viewerDevice)
  })

## debugging an object: call browser() in method $edit()
xx$trace(edit, browser)

## debugging all objects from class mEdit in method $undo()
mEdit$trace(undo, browser)
removeClass("mEdit")
resetGeneric("$")
resetGeneric("initialize")

Run the code above in your browser using DataLab