A dbi.table is a data structure that describes a SQL query (called the
dbi.table's underlying SQL query). This query can be manipulated
using data.table
's [i, j, by]
syntax.
dbi.table(conn, id, check.names = FALSE, key = NULL, stringsAsFactors = FALSE)# S3 method for dbi.table
[(x, i, j, by, keyby, nomatch = NA, on = NULL)
A dbi.table
.
A DBIConnection
object, as
returned by dbConnect
. Alternatively, a
dbi.catalog
or a dbi.table
, in which case the new
dbi.table
will use the connection embedded in the provided object.
An Id
, a character string (which will be converted to
an Id
by Id
), or a SQL
object
(advanced) identifying a database object (e.g., table or view) on
conn
.
Just as check.names
in data.table
and
data.frame
.
A character vector of one or more column names to set as the resulting
dbi.table
's key.
A logical value (default is FALSE
). Convert all character
columns to factor
s when executing the dbi.table
's underlying
SQL query and retrieving the result set.
A dbi.table
.
A logical expression of the columns of x
, a dbi.table
,
or a data.frame
. Use i
to select a subset of the rows of
x
. Note: unlike data.table
, i
cannot be a
vector.
When i
is a logical expression, the rows where the expression is
TRUE
are returned. If the expression contains a symbol foo
that is not a column name of x
but that is present in the calling
scope, then the value of foo
will be substituted into the expression
if foo
is a scalar, or if foo
is a vector and is the
right-hand-side argument to %in%
or %chin%
(substitution
occurs when the extract ([
) method is evaluated).
When i
inherits from data.frame
, it is coerced to a
dbi.table
.
When i
is a dbi.table
, the rows of x
that match
(according to the condition specificed in on
) the rows
of i
are returned. When nomatch == NA
, all rows of i
are returned (right outer join); when nomatch == NULL
, only the rows
of i
that match a row of x
are returned (inner join).
A list of expressions, a literal character vector of column names of
x
, an expression of the form start_name:end_name
, or a
literal numeric vector of integer values indexing the columns of x
.
Use j
to select (and optionally, transform) the columns of x
.
A list of expressions, a literal character vector of column names of
x
, an expression of the form start_name:end_name
, or a
literal numeric vector of integer values indexing the columns of x
.
Use by
to control grouping when evaluating j
.
Same as by
, but additionally sets the key of the resulting
dbi.table
to the columns provided in by
. May also be
TRUE
or FALSE
when by
is provided as an alternative
way to accomplish the same operation.
Either NA
or NULL
.
An unnamed character vector, e.g., x[i, on = c("a", "b")]
,
used when columns a
and b
are common to both x
and i
.
Foreign key joins: As a named character vector when the join
columns have different names in x
and i
. For example,
x[i, on = c(x1 = "i1", x2 = "i2")]
joins x
and
i
by matching columns x1
and x2
in x
with columns i1
and i2
in i
, respectively.
Foreign key joins can also use the binary operator ==
, e.g.,
x[i, on = c("x1 == i1", "x2 == i2")]
.
It is also possible to use .()
syntax as
x[i, on = .(a, b)]
.
Non-equi joins using binary operators >=
, >
,
<=
, <
are also possible, e.g.,
x[i, on = c("x >= a", "y <= b")]
, or
x[i, on = .(x >= a, y <= b)]
.
A key marks a dbi.table
as sorted with an attribute "sorted"
.
The sorted columns are the key. The key can be any number of columns.
Unlike data.table
, the underlying data are not physically sorted, so
there is no performance improvement. However, there remain benefits to
using keys:
The key provides a default order for window queries so that
functions like shift
and
cumsum
give reproducible output.
dbi.table
's merge
method uses a dbi.table
's
key to determin the default columns to merge on in the same way
that data.table
's merge method does. Note: if a
dbi.table
has a foreign key relationship, that will be used
to determin the default columns to merge on before the
dbi.table
's key is considered.
A table's primary key is used as the default key
when it can be
determined.
Differences vs. data.table
Keys
There are a few key differences between dbi.table
keys and
data.table
keys.
In data.table
, NA
s are always first. Some databases
(e.g., PostgreSQL) sort NULL
s last by default and some
databases (e.g., SQLite) sort them first. as.data.frame
does
not change the order of the result set returned by the database.
Note that as.data.table
uses the dbi.table
's key so
that the resulting data.table
is sorted in the usual
data.table
way.
The sort is not stable: the order of ties may change on
subsequent evaluations of the dbi.table
's underlying SQL
query.
Strict Processing of Keys
By default, when previewing data (dbi.table
's print
method), the key is not included in the underlying SQL query's ORDER BY
clause. However, the result set is sorted locally to resepct the key. This
behavior is referred to as a non-strict evaluation of the key and
the printed output labels the key (non-strict)
. To override the
default behavior for a single preview, call print
explicitly and
provide the optional argument strict = TRUE
. To change the default
behavior, set the option dbitable.print.strict
to TRUE
.
Non-strict evaluation of keys reduces the time taken to retrieve the preview.
as.data.frame
to retrieve the
results set as a data.frame
,
csql
to see the underlying SQL query.
# open a connection to the Chinook example database using duckdb
duck <- chinook.duckdb()
# create a dbi.table corresponding to the Album table on duck
Album <- dbi.table(duck, DBI::Id(table_name = "Album"))
# the print method displays a 5 row preview
# print(Album)
Album
# 'id' can also be 'SQL'; use the same DBI connection as Album
Genre <- dbi.table(Album, DBI::SQL("chinook_duckdb.main.Genre"))
# use the extract ([...]) method to subset the dbi.table
Album[AlbumId < 5, .(Title, nchar = paste(nchar(Title), "characters"))]
# use csql to see the underlying SQL query
csql(Album[AlbumId < 5, #WHERE
.(Title, #SELECT
nchar = paste(nchar(Title), "characters"))])
DBI::dbDisconnect(duck)
Run the code above in your browser using DataLab