Learn R Programming

modeldb

Fit models inside the database! modeldb works with most database back-ends because it leverages dplyr and dbplyr for the final SQL translation of the algorithm. It currently supports:

  • K-means clustering

  • Linear regression

Installation

Install the CRAN version with:

install.packages("modeldb")

The development version is available from GitHub using remotes:

# install.packages("remotes")
remotes::install_github("tidymodels/modeldb")

Linear regression

An easy way to try out the package is by creating a temporary SQLite database, and loading mtcars to it.

con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
RSQLite::initExtension(con)
dplyr::copy_to(con, mtcars)
library(dplyr)

tbl(con, "mtcars") %>%
  select(wt, mpg, qsec) %>%
  linear_regression_db(wt)
## # A tibble: 1 × 3
##   `(Intercept)`    mpg  qsec
##           <dbl>  <dbl> <dbl>
## 1          4.12 -0.156 0.125

The model output can be parsed by tidypredict to run the predictions in the database. Please see the “Linear Regression” article to learn more about how to use linear_regression_db()

K Means clustering

To use the simple_kmeans_db() function, simply pipe the database back end table to the function. This returns a list object that contains two items:

  • A sql query table with the final center assignment
  • A local table with the information about the centers
km <- tbl(con, "mtcars") %>%
  simple_kmeans_db(mpg, wt)

colnames(km)
##  [1] "k_center" "k_mpg"    "k_wt"     "mpg"      "cyl"      "disp"    
##  [7] "hp"       "drat"     "wt"       "qsec"     "vs"       "am"      
## [13] "gear"     "carb"

The SQL statement from tbl can be extracted using dbplyr’s remote_query()

dbplyr::remote_query(km)
## <SQL> SELECT `k_center`, `k_mpg`, `k_wt`, `mpg`, `cyl`, `disp`, `hp`, `drat`, `wt`, `qsec`, `vs`, `am`, `gear`, `carb`
## FROM (SELECT `mpg`, `cyl`, `disp`, `hp`, `drat`, `wt`, `qsec`, `vs`, `am`, `gear`, `carb`, `LHS`.`k_center` AS `k_center`, `k_mpg`, `k_wt`
## FROM (SELECT `mpg`, `cyl`, `disp`, `hp`, `drat`, `wt`, `qsec`, `vs`, `am`, `gear`, `carb`, `center` AS `k_center`
## FROM (SELECT `mpg`, `cyl`, `disp`, `hp`, `drat`, `wt`, `qsec`, `vs`, `am`, `gear`, `carb`, `center_1`, `center_2`, `center_3`, CASE
## WHEN (`center_1` >= `center_1` AND `center_1` < `center_2` AND `center_1` < `center_3`) THEN ('center_1')
## WHEN (`center_2` < `center_1` AND `center_2` >= `center_2` AND `center_2` < `center_3`) THEN ('center_2')
## WHEN (`center_3` < `center_1` AND `center_3` < `center_2` AND `center_3` >= `center_3`) THEN ('center_3')
## END AS `center`
## FROM (SELECT `mpg`, `cyl`, `disp`, `hp`, `drat`, `wt`, `qsec`, `vs`, `am`, `gear`, `carb`, SQRT(((20.6428571428571 - `mpg`) * (20.6428571428571 - `mpg`)) + ((3.07214285714286 - `wt`) * (3.07214285714286 - `wt`))) AS `center_1`, SQRT(((14.4583333333333 - `mpg`) * (14.4583333333333 - `mpg`)) + ((4.05866666666667 - `wt`) * (4.05866666666667 - `wt`))) AS `center_2`, SQRT(((30.0666666666667 - `mpg`) * (30.0666666666667 - `mpg`)) + ((1.873 - `wt`) * (1.873 - `wt`))) AS `center_3`
## FROM `mtcars`))
## WHERE (NOT(((`center`) IS NULL)))) AS `LHS`
## LEFT JOIN (SELECT `center` AS `k_center`, `mpg` AS `k_mpg`, `wt` AS `k_wt`
## FROM (SELECT `center`, AVG(`mpg`) AS `mpg`, AVG(`wt`) AS `wt`
## FROM (SELECT `mpg`, `wt`, `center`
## FROM (SELECT `mpg`, `cyl`, `disp`, `hp`, `drat`, `wt`, `qsec`, `vs`, `am`, `gear`, `carb`, `center_1`, `center_2`, `center_3`, CASE
## WHEN (`center_1` >= `center_1` AND `center_1` < `center_2` AND `center_1` < `center_3`) THEN ('center_1')
## WHEN (`center_2` < `center_1` AND `center_2` >= `center_2` AND `center_2` < `center_3`) THEN ('center_2')
## WHEN (`center_3` < `center_1` AND `center_3` < `center_2` AND `center_3` >= `center_3`) THEN ('center_3')
## END AS `center`
## FROM (SELECT `mpg`, `cyl`, `disp`, `hp`, `drat`, `wt`, `qsec`, `vs`, `am`, `gear`, `carb`, SQRT(((20.6428571428571 - `mpg`) * (20.6428571428571 - `mpg`)) + ((3.07214285714286 - `wt`) * (3.07214285714286 - `wt`))) AS `center_1`, SQRT(((14.4583333333333 - `mpg`) * (14.4583333333333 - `mpg`)) + ((4.05866666666667 - `wt`) * (4.05866666666667 - `wt`))) AS `center_2`, SQRT(((30.0666666666667 - `mpg`) * (30.0666666666667 - `mpg`)) + ((1.873 - `wt`) * (1.873 - `wt`))) AS `center_3`
## FROM `mtcars`))
## WHERE (NOT(((`center`) IS NULL))))
## GROUP BY `center`)) AS `RHS`
## ON (`LHS`.`k_center` = `RHS`.`k_center`)
## )

Contributing

This project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('modeldb')

Monthly Downloads

233

Version

0.3.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Max Kuhn

Last Published

November 1st, 2023

Functions in modeldb (0.3.0)

linear_regression_db

Fits a Linear Regression model
modeldb-package

modeldb: Fits Models Inside the Database
plot_kmeans

Visualize a KMeans Cluster with lots of data
simple_kmeans_db

Simple kmeans routine that works in-database
reexports

Objects exported from other packages
add_dummy_variables

Creates dummy variables
as_parsed_model.modeldb_lm

Prepares parsed model object