Currently, the hash is obtained by means of serialisation. In order to
make semantically same values have same hashes on a wide range of R
versions, the following steps were taken:
When computing the hash of the serialized data (only the XDR
format version 2 or 3 is supported), the first 14 bytes containing
the header (including the version of R that serialized the data) are
ignored.
Every function is “rebuilt” from its body before
hashing, forcing R to discard the bytecode and the source references
from the copy of the function before it's hashed.
Strings are converted to UTF-8 before hashing.
All this is done recursively.
The exact algorithm used and the way hash is obtained are
implementation details and may eventually change, though not without a
good reason.
Other aspects of R data structures are currently not handled:
Nothing is done about environments. Due to them being
reference objects, any fix-up must re-create them from scratch,
taking potentially recursive dependencies into account, which is
likely expensive.
Some S4 classes (like reference class implementations) just
have different representations in different versions of R and
third-party packages. They may mean the same thing, but they
serialize to different byte sequences.