Learn R Programming

⚠️There's a newer version (2.5.0) of this package.Take me there.

dbplyr

Overview

dbplyr is the database backend for dplyr. It allows you to use remote database tables as if they are in-memory data frames by automatically converting dplyr code into SQL.

To learn more about why you might use dbplyr instead of writing SQL, see vignette("sql"). To learn more about the details of the SQL translation, see vignette("translation-verb") and vignette("translation-function").

Installation

# The easiest way to get dbplyr is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just dbplyr:
install.packages("dbplyr")

# Or the the development version from GitHub:
# install.packages("devtools")
devtools::install_github("tidyverse/dbplyr")

Usage

dbplyr is designed to work with database tables as if they were local data frames. To demonstrate this I’ll first create an in-memory SQLite database and copy over a dataset:

library(dplyr, warn.conflicts = FALSE)

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, mtcars)

Note that you don’t actually need to load dbplyr with library(dbplyr); dplyr automatically loads it for you when it sees you working with a database. Database connections are coordinated by the DBI package. Learn more at https://dbi.r-dbi.org/

Now you can retrieve a table using tbl() (see ?tbl_dbi for more details). Printing it just retrieves the first few rows:

mtcars2 <- tbl(con, "mtcars")
mtcars2
#> # Source:   table<mtcars> [?? x 11]
#> # Database: sqlite 3.39.4 [:memory:]
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # … with more rows

All dplyr calls are evaluated lazily, generating SQL that is only sent to the database when you request the data.

# lazily generates query
summary <- mtcars2 %>% 
  group_by(cyl) %>% 
  summarise(mpg = mean(mpg, na.rm = TRUE)) %>% 
  arrange(desc(mpg))

# see query
summary %>% show_query()
#> <SQL>
#> SELECT `cyl`, AVG(`mpg`) AS `mpg`
#> FROM `mtcars`
#> GROUP BY `cyl`
#> ORDER BY `mpg` DESC

# execute query and retrieve results
summary %>% collect()
#> # A tibble: 3 × 2
#>     cyl   mpg
#>   <dbl> <dbl>
#> 1     4  26.7
#> 2     6  19.7
#> 3     8  15.1

Code of Conduct

Please note that the dbplyr project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('dbplyr')

Monthly Downloads

666,858

Version

2.3.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Hadley Wickham

Last Published

January 16th, 2023

Functions in dbplyr (2.3.0)

backend-mssql

Backend: SQL server
args_by

Helper for consistent documentation of .by
arrange.tbl_lazy

Arrange rows by column values
backend-impala

Backend: Impala
backend-hive

Backend: Hive
copy_inline

Use a local data frame in a dbplyr query
backend-odbc

Backend: ODBC
backend-oracle

Backend: Oracle
collapse.tbl_sql

Compute results of a query
copy_to.src_sql

Copy a local data frame to a remote database
backend-postgres

Backend: PostgreSQL
backend-teradata

Backend: Teradata
complete.tbl_lazy

Complete a SQL table with missing combinations of data
backend-redshift

Backend: Redshift
lazy_ops

Lazy operations
in_schema

Refer to a table in a schema or a database catalog
do.tbl_sql

Perform arbitrary computation on remote backend
distinct.tbl_lazy

Subset distinct/unique rows
ident_q

Declare a identifer as being pre-quoted.
lahman

Cache and retrieve an src_sqlite of the Lahman baseball database.
backend-sqlite

Backend: SQLite
backend-hana

Backend: SAP HANA
backend-snowflake

Backend: Snowflake
backend-access

Backend: MS Access
dbplyr-slice

Subset rows using their positions
dbplyr_uncount

"Uncount" a database table
count.tbl_lazy

Count observations by group
db-io

Database I/O generics
simulate_vars

Simulate variables to use in tidyselect
build_sql

Build a SQL string.
db-misc

Miscellaneous database generics
head.tbl_lazy

Subset the first rows
db-quote

SQL escaping/quoting generics
escape

Escape/quote a string.
intersect.tbl_lazy

SQL set operations
named_commas

Provides comma-separated string out of the parameters
ident

Flag a character vector as SQL identifiers
reexports

Objects exported from other packages
nycflights13

Database versions of the nycflights13 data
remote_name

Metadata about a remote table
join.tbl_sql

Join SQL tables
memdb_frame

Create a database table in temporary in-memory database.
mutate.tbl_lazy

Create, modify, and delete columns
select.tbl_lazy

Subset, rename, and reorder columns using their names
sql

SQL escaping.
sql_substr

Create an sql translator
expand.tbl_lazy

Expand SQL tables to include all possible combinations of values
summarise.tbl_lazy

Summarise each group to one row
src_sql

Create a "sql src" object
src_dbi

Database src
lazy_multi_join_query

Build and render SQL from a sequence of lazy operations
simulate_hive

Simulate database connections
db-sql

SQL generation generics
fill.tbl_lazy

Fill in missing values with previous or next value
tbl_sql

Create an SQL tbl (abstract)
testing

Infrastructure for testing dplyr
sql_expr

Generate SQL from R expressions
window_order

Override window order and frame
filter.tbl_lazy

Subset rows using column values
replace_na.tbl_lazy

Replace NAs with specified values
partial_eval

Partially evaluate an expression.
pivot_longer.tbl_lazy

Pivot data from wide to long
get_returned_rows

Extract and check the RETURNING rows
dbplyr-package

dbplyr: A 'dplyr' Back End for Databases
group_by.tbl_lazy

Group by one or more variables
rows_insert.tbl_lazy

Edit individual rows in the underlying database table
tbl_lazy

Create a local lazy tibble
tbl.src_dbi

Use dplyr verbs with a remote database table
pivot_wider.tbl_lazy

Pivot data from long to wide
win_over

Generate SQL expression for window functions
sql_query_insert

Generate SQL for Insert, Update, Upsert, and Delete
pull.tbl_sql

Extract a single column
translate_sql

Translate an expression to SQL
sql_quote

Helper function for quoting sql elements.
backend-mysql

Backend: MySQL/MariaDB