dbi.table-package: DBI Table

Description

A dbi.table is a data structure that describes a SQL query (called the dbi.table's underlying SQL query). This query can be manipulated using data.table's [i, j, by] syntax.

Usage

dbi.table(conn, id, check.names = FALSE, key = NULL, stringsAsFactors = FALSE)
# S3 method for dbi.table
[(x, i, j, by, keyby, nomatch = NA, on = NULL)

Value

A dbi.table.

Arguments

conn

A DBIConnection object, as returned by dbConnect. Alternatively, a dbi.catalog or a dbi.table, in which case the new dbi.table will use the connection embedded in the provided object.

id

An Id, a character string (which will be converted to an Id by Id), or a SQL object (advanced) identifying a database object (e.g., table or view) on conn.

check.names

Just as check.names in data.table and data.frame.

key

A character vector of one or more column names to set as the resulting dbi.table's key.

stringsAsFactors

A logical value (default is FALSE). Convert all character columns to factors when executing the dbi.table's underlying SQL query and retrieving the result set.

x

A dbi.table.

i

A logical expression of the columns of x, a dbi.table, or a data.frame. Use i to select a subset of the rows of x. Note: unlike data.table, i cannot be a vector.

When i is a logical expression, the rows where the expression is TRUE are returned. If the expression contains a symbol foo that is not a column name of x but that is present in the calling scope, then the value of foo will be substituted into the expression if foo is a scalar, or if foo is a vector and is the right-hand-side argument to %in% or %chin% (substitution occurs when the extract ([) method is evaluated).

When i inherits from data.frame, it is coerced to a dbi.table.

When i is a dbi.table, the rows of x that match (according to the condition specificed in on) the rows of i are returned. When nomatch == NA, all rows of i are returned (right outer join); when nomatch == NULL, only the rows of i that match a row of x are returned (inner join).

j

A list of expressions, a literal character vector of column names of x, an expression of the form start_name:end_name, or a literal numeric vector of integer values indexing the columns of x. Use j to select (and optionally, transform) the columns of x.

by

keyby

Same as by, but additionally sets the key of the resulting dbi.table to the columns provided in by. May also be TRUE or FALSE when by is provided as an alternative way to accomplish the same operation.

nomatch

Either NA or NULL.

on

An unnamed character vector, e.g., x[i, on = c("a", "b")], used when columns a and b are common to both x and i.
Foreign key joins: As a named character vector when the join columns have different names in x and i. For example, x[i, on = c(x1 = "i1", x2 = "i2")] joins x and i by matching columns x1 and x2 in x with columns i1 and i2 in i, respectively.
Foreign key joins can also use the binary operator ==, e.g., x[i, on = c("x1 == i1", "x2 == i2")].
It is also possible to use .() syntax as x[i, on = .(a, b)].
Non-equi joins using binary operators >=, >, <=, < are also possible, e.g., x[i, on = c("x >= a", "y <= b")], or x[i, on = .(x >= a, y <= b)].

Keys

A key marks a dbi.table as sorted with an attribute "sorted". The sorted columns are the key. The key can be any number of columns. Unlike data.table, the underlying data are not physically sorted, so there is no performance improvement. However, there remain benefits to using keys:

The key provides a default order for window queries so that functions like shift and cumsum give reproducible output.
dbi.table's merge method uses a dbi.table's key to determin the default columns to merge on in the same way that data.table's merge method does. Note: if a dbi.table has a foreign key relationship, that will be used to determin the default columns to merge on before the dbi.table's key is considered.

A table's primary key is used as the default key when it can be determined.

Differences vs. data.table Keys

There are a few key differences between dbi.table keys and data.table keys.

In data.table, NAs are always first. Some databases (e.g., PostgreSQL) sort NULLs last by default and some databases (e.g., SQLite) sort them first. as.data.frame does not change the order of the result set returned by the database. Note that as.data.table uses the dbi.table's key so that the resulting data.table is sorted in the usual data.table way.
The sort is not stable: the order of ties may change on subsequent evaluations of the dbi.table's underlying SQL query.

Strict Processing of Keys

By default, when previewing data (dbi.table's print method), the key is not included in the underlying SQL query's ORDER BY clause. However, the result set is sorted locally to resepct the key. This behavior is referred to as a non-strict evaluation of the key and the printed output labels the key (non-strict). To override the default behavior for a single preview, call print explicitly and provide the optional argument strict = TRUE. To change the default behavior, set the option dbitable.print.strict to TRUE.

Non-strict evaluation of keys reduces the time taken to retrieve the preview.

Examples

Run this code

  # open a connection to the Chinook example database using duckdb
  duck <- chinook.duckdb()

  # create a dbi.table corresponding to the Album table on duck
  Album <- dbi.table(duck, DBI::Id(table_name = "Album"))

  # the print method displays a 5 row preview
  # print(Album)
  Album

  # 'id' can also be 'SQL'; use the same DBI connection as Album
  Genre <- dbi.table(Album, DBI::SQL("chinook_duckdb.main.Genre"))

  # use the extract ([...]) method to subset the dbi.table
  Album[AlbumId < 5, .(Title, nchar = paste(nchar(Title), "characters"))]

  # use csql to see the underlying SQL query
  csql(Album[AlbumId < 5, #WHERE
             .(Title, #SELECT
               nchar = paste(nchar(Title), "characters"))])

  DBI::dbDisconnect(duck)