Learn R Programming

pool (version 0.1.0)

src_pool: Wrap DBI Database Connection Pool for dplyr use.

Description

This functions opens a dplyr-compatible connection. If you're used to using dplyr to query databases, then you're familiar with the src_* functions (e.g. src_mysql). In those functions, you typically have to specify the drv and all the authorization arguments required to connect to the database. If you're using pool, however, you've already did all of this when you created the Pool object with dbPool. So for src_pool, you only need to pass in that same Pool object. Then, you can use the resulting object just like in dplyr (see the examples below).

Usage

src_pool(pool)

Arguments

pool

A pool object

Comparison to <code>dplyr</code> code

First, let's show how you'd connect and query a MySQL database using only dplyr:

my_db <- src_mysql(
  dbname = "shinydemo",
  host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com",
  user = "guest",
  password = "guest"
)
# get the first 5 rows:
my_db %>% tbl("City") %>% head(5)

Now, let's do the same thing using a Pool object:

pool <- dbPool(
  drv = RMySQL::MySQL(),
  dbname = "shinydemo",
  host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com",
  username = "guest",
  password = "guest"
)
# get the first 5 rows:
src_pool(pool) %>% tbl("City") %>% head(5)

Performance

What's the advantage of using pool with dplyr, rather than just using dplyr to query a database? As usual with pool, the answer is performance (less emphasis on connection management this time since dplyr already did a good job on that). However, this only applies for some situations. Basically, when you use any of the src_* functions (src_pool included), you are creating connections that will only be closed on garbage collection. If you are querying a database a lot and using src_* each time, you are always fetching and closing connections. The difference is that, without pool, you're always fetching connections from the database itself (potentially pretty computationally expensive) and closing them for good. But with pool, you're just fetching and returning connections to and from the pool, which is essentially free.

(You might not need to use get a new connection with src_* for each query, but if you have a potentially unbounded number of users, as in a hosted Shiny app, you certainly should -- or you'll risk runnign out of connections. See this article to learn more about using dplyr together with pool in a Shiny app.)

Examples

Run this code

pool <- dbPool(
  drv = RMySQL::MySQL(),
  dbname = "shinydemo",
  host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com",
  username = "guest",
  password = "guest"
)

# equivalent to: dbGetQuery(pool, "SELECT * from City LIMIT 5;")
src_pool(pool) %>% tbl("City") %>% head(5)
#>   ID           Name CountryCode      District Population
#> 1  1          Kabul         AFG         Kabol    1780000
#> 2  2       Qandahar         AFG      Qandahar     237500
#> 3  3          Herat         AFG         Herat     186800
#> 4  4 Mazar-e-Sharif         AFG         Balkh     127800
#> 5  5      Amsterdam         NLD Noord-Holland     731200

poolClose(pool)

Run the code above in your browser using DataLab