mdb_dbi: Use mdb transactions

Description

Database handles are fairly opaque objects used to indicate which database within an mdb_env operations will happen to. This object has therefore got very few methods, all of which are purely informative. Most commonly, a mdb_dbi object will be passed into the mdb_env's $begin() method to begin a transaction on a particular database.

Arguments

Methods

path

Return the absolute path to the LMDB store (on disk)

Usage: path()

Value: A string

Note: In lmdb.h this is mdb_env_get_path()

flags

Return flags as used in construction of the LMDB environment

Usage: flags()

Value: A named logical vector. Names correspond to arguments to the constructor.

Note: In lmdb.h this is mdb_env_get_flags()

info

Brief information about the LMDB environment

Usage: info()

Value: An integer vector with elements mapsize, last_pgno, last_txnid, maxreaders and numreaders.

Note: In lmdb.h this is mdb_env_info()

stat

Brief statistics about the LMDB environment.

Usage: stat()

Value: An integer vector with elements psize (the size of a database page), depth (depth of the B-tree), brancb_pages (number of internal non-leaf) pages), leaf_pages (number of leaf pages), overflow_pages (number of overflow pages) and entries (number of data items).

Note: In lmdb.h this is mdb_env_stat()

maxkeysize

The maximum size of a key (the value can be bigger than this)

Usage: maxkeysize()

Value: A single integer

Note: In lmdb.h this is mdb_env_get_maxkeysize()

maxreaders

The maximum number of readers

Usage: maxreaders()

Value: A single integer

Note: In lmdb.h this is mdb_env_get_maxreaders()

begin

Begin a transaction

Usage: begin(db = NULL, write = FALSE, sync = NULL, metasync = NULL)

Arguments:

db: A database handle, as returned by open_database. If NULL (the default) then the default database will be used.
write: Scalar logical, indicating if this should be a write transaction. There can be only one write transaction per database (see mdb_txn for more details) - it is an error to try to open more than one.
sync: Scalar logical, indicating if the data should be syncronised synchronised (flushed to disk) after writes; see main parameter list.
metasync: Scalar logical, indicating if the metadata should be synchronised (flushed to disk) after writes; see main parameter list.

Details: Transactions are the key objects for interacting with an LMDB database (aside from the convenience interface below). They are described in more detail in mdb_txn.

Value: A mdb_txn object

Note: In lmdb.h this is mdb_begin()

with_transaction

Evaluate some code within a transaction

Usage: with_transaction(fun, db = NULL, write = FALSE)

Arguments:

fun: A function of one argument that does the work of the transaction. with_transaction will pass the transaction to this function. This is most easily explained with an example, so see the bottom of the help
db: A database handle, as returned by open_database. If NULL (the default) then the default database will be used.
write: Scalar logical, indicating if this should be a write transaction. There can be only one write transaction per database (see mdb_txn for more details) - it is an error to try to open more than one.

Details: This exists to simplify a pattern where one wants to open a transaction, evaluate some code with that transaction and if anything goes wrong abort, but otherwise commit. It is most useful with read-write transactions, but can be used with both (and the default is for readonly transactions, like begin().

open_database

Open a named database, or return one if already opened.

Usage: open_database(key = NULL, reversekey = FALSE, create = TRUE)

Arguments:

key: Name of the database; if NULL this returns the default database (always open).
reversekey: Compare strings in reverse order? See reversekey documentation above
create: Create database if it does not exist already?

Details: LMDB environments can hold multiple databases, provided they have been opened with maxdbs greater than one. There is always a "default" database - this is unnamed and cannot be dropped. Other databases have a key (i.e., a name) and can be dropped. These database objects are passed through to other methods, notably drop_database and begin

Note: In lmdb.h this is mdb_open()

drop_database

Drop a database

Usage: drop_database(db, delete = TRUE)

Arguments:

db: A database object, as returned by open_database
delete: Scalar logical, indicating if the database should be deleted too. If FALSE, the values are deleted from the database (i.e., it is emptied). If TRUE then the actual database is deleted too.

Value: No return value, called for side effects only

Note: In lmdb.h this is mdb_drop()

sync

Flush the data buffers to disk.

Usage: sync(force = FALSE)

Arguments:

force: Scalar logical; force a synchronous flush. Otherwise if the environment was constructed with sync = FALSE the flushes will be omitted, and with mapasync = TRUE they will be asynchronous.

Details: Data is always written to disk when a transaction is committed, but the operating system may keep it buffered. LMDB always flushes the OS buffers upon commit as well, unless the environment was opened with sync = FALSE or in part metasync = FALSE. This call is not valid if the environment was opened with readonly = TRUE.

Note: In lmdb.h this is mdb_env_sync()

copy

Copy the entire environment state to a new path. This can be used to make a backup of the database.

Usage: copy(path, compact = FALSE)

Arguments:

path: Scalar character; the new path
compact: Scalar logical; perform compaction while copying? This omits free pages and sequentially renumbers all pages in output. This can take longer than the default but produce a smaller database

Value: Invisibly, the new path (allowing use of $copy(tempfile))

Note: In lmdb.h this is mdb_env_copy() & mdb_env_copy2()

close

Close the environment. This closes all cursors and transactions (active write transactions are aborted).

Usage: close()

Value: No return value, called for side effects only

Note: In lmdb.h this is mdb_env_close()

destroy

Totally destroy an LMDB environment. This closes the database and removes the files. Use with care!

Usage: destroy()

Value: No return value, called for side effects only

reader_list

List information about database readers

Usage: reader_list()

Value: A character matrix with columns pid (process ID), thread (a pointer address), and txnid (a small integer)

Note: In lmdb.h this is mdb_reader_list()

reader_check

Check for, and remove, stale entries in the reader lock table.

Usage: reader_check()

Value: An integer, being the number of stale readers discarded. However, this function is primarily called for its side effect.

Note: In lmdb.h this is mdb_reader_check()

get

Retrieve a value from the database

Usage: get(key, missing_is_error = TRUE, as_raw = NULL, db = NULL)

Arguments:

key: A string (or raw vector) - the key to get
missing_is_error: Logical, indicating if a missing value is an error (by default it is). Alternatively, with missing_is_error = FALSE, a missing value will return NULL. Because no value can be NULL (all values must have nonzero length) a NULL is unambiguously missing.
as_raw: Either NULL, or a logical, to indicate the result type required. With as_raw = NULL, the default, the value will be returned as a string if possible. If not possible it will return a raw vector. With as_raw = TRUE, get() will always return a raw vector, even when it is possibly to represent the value as a string. If as_raw = FALSE, get will return a string, but throw an error if this is not possible. This is discussed in more detail in the thor vignette (vignette("thor"))
db: A database handle that would be passed through to create the transaction (see the $begin method).

Details: This is a helper method that establishes a temporary read-only transaction, calls the corresponding method in mdb_txn and then aborts the transaction.

Note: In lmdb.h this is mdb_get()

put

Put values into the database. In other systems, this might be called "set".

Usage: put(key, value, overwrite = TRUE, append = FALSE, db = NULL)

Arguments:

key: The name of the key (string or raw vector)
value: The value to save (string or raw vector)
overwrite: Logical - when TRUE it will overwrite existing data; when FALSE throw an error
append: Logical - when TRUE, append the given key/value to the end of the database. This option allows fast bulk loading when keys are already known to be in the correct order. But if you load unsorted keys with append = TRUE an error will be thrown
db: A database handle that would be passed through to create the transaction (see the $begin method).

Details: This is a helper method that establishes a temporary read-write transaction, calls the corresponding method in mdb_txn and then commits the transaction. This will only be possible to use if there is not an existing write transaction in effect for this environment.

Note: In lmdb.h this is mdb_put()

del

Remove a key/value pair from the database

Usage: del(key, db = NULL)

Arguments:

key: The name of the key (string or raw vector)
db: A database handle that would be passed through to create the transaction (see the $begin method).

Value: A scalar logical, indicating if the value was deleted

Note: In lmdb.h this is mdb_del()

exists

Test if a key exists in the database.

Usage: exists(key, db = NULL)

Arguments:

key: The name of the key to test (string or raw vector). Unlike get, put and del (but like mget, mput and mdel), exists is vectorised. So the input here can be; a character vector of any length (returning the same length logical vector), a raw vector (representing one key, returning a scalar logical) or a list with each element being either a scalar character or a raw vector, returning a logical the same length as the list.
db: A database handle that would be passed through to create the transaction (see the $begin method).

Details: This is an extension of the raw LMDB API and works by using mdb_get for each key (which for lmdb need not copy data) and then testing whether the return value is MDB_SUCCESS or MDB_NOTFOUND.

This is a helper method that establishes a temporary read-only transaction, calls the corresponding method in mdb_txn and then aborts the transaction.

Value: A logical vector

list

List keys in the database

Usage: list(starts_with = NULL, as_raw = FALSE, size = NULL, db = NULL)

Arguments:

starts_with: Optionally, a prefix for all strings. Note that is not a regular expression or a filename glob. Using foo will match foo, foo:bar and foobar but not fo or FOO. Because LMDB stores keys in a sorted tree, using a prefix can greatly reduce the number of keys that need to be tested.
as_raw: Same interpretation as as_raw in $get() but with a different default. It is expected that most of the time keys will be strings, so by default we'll try and return a character vector as_raw = FALSE. Change the default if your database contains raw keys.
size: For use with starts_with, optionally a guess at the number of keys that would be returned. with starts_with = NULL we can look the number of keys up directly so this is ignored.
db: A database handle that would be passed through to create the transaction (see the $begin method).

Details: This is a helper method that establishes a temporary read-only transaction, calls the corresponding method in mdb_txn and then aborts the transaction.

mget

Get values for multiple keys at once (like $get but vectorised over key)

Usage: mget(key, as_raw = NULL, db = NULL)

Arguments:

key: The keys to get values for. Zero, one or more keys are allowed.
as_raw: As for $get(), logical (or NULL) indicating if raw or string output is expected or desired.
db: A database handle that would be passed through to create the transaction (see the $begin method).

Details: This is a helper method that establishes a temporary read-only transaction, calls the corresponding method in mdb_txn and then aborts the transaction.

mput

Put multiple values into the database (like $put but vectorised over key/value).

Usage: mput(key, value, overwrite = TRUE, append = FALSE, db = NULL)

Arguments:

key: The keys to set
value: The values to set against these keys. Must be the same length as key.
overwrite: As for $put
append: As for $put
db: A database handle that would be passed through to create the transaction (see the $begin method).

Details: The implementation simply calls mdb_put repeatedly (but with a single round of error checking) so duplicate key entries will result in the last key winning.

This is a helper method that establishes a temporary read-write transaction, calls the corresponding method in mdb_txn and then commits the transaction. This will only be possible to use if there is not an existing write transaction in effect for this environment.

mdel

Delete multiple values from the database (like $del but vectorised over key).

Usage: mdel(key, db = NULL)

Arguments:

key: The keys to delete
db: A database handle that would be passed through to create the transaction (see the $begin method).

Value: A logical vector, the same length as key, indicating if each key was deleted.

Examples

Run this code

# NOT RUN {
# As always, start with the environment.  Because we're going to
# use more than one database, we must set `maxdbs` to more than 1:
env <- thor::mdb_env(tempfile(), maxdbs = 10)

# The default environment - every database
db <- env$open_database()
# The default database will always have id 1 and no name
db$id()
db$name()

# A different database
foo <- env$open_database("foo")
foo$id()
foo$name()

# Opening a database multiple times has no effect - it returns the
# same data base every call.
identical(env$open_database("foo"), foo) # TRUE

# Then we can put some data int the new database:
txn <- env$begin(foo, write = TRUE)
txn$put("hello", "world")
txn$commit()

# Now we have values in the "foo" database, but not the default one:
env$get("hello", db = NULL, missing_is_error = FALSE) # NULL
env$get("hello", db = foo,  missing_is_error = FALSE) # "world"

# Cleanup
env$destroy()
# }

Run the code above in your browser using DataLab