Learn R Programming

modelc

modelc is an R model object to SQL compiler. It generates SQL select statements from linear and generalized linear models.

Its interface currently consists of a single function, modelc, which takes a single input, namely an lm or glm model object.

It currently supports Gaussian and gamma family distributions using log or identity link functions.

To import linear models directly to your SQL Server database, consider using Castpack, which depends on modelc.

Usage

Supposing the following data

a <- 1:10
b <- 2*1:10 + runif(1) * 1.5
c <- as.factor(1:10)
df <- data.frame(a,b,c)
formula = b ~ a + c

A vanilla linear model

linear_model <- lm(formula, data=df)
modelc(linear_model)

generates the following SQL

  0.231808555545287 + 2 * `a` + (
    CASE
      WHEN c = 2 THEN -0.00000000000000193216758587821 * c
      WHEN c = 3 THEN -0.000000000000000776180314897008 * c
      WHEN c = 4 THEN -0.000000000000000665297412768863 * c
      WHEN c = 5 THEN -0.00000000000000055441451064072 * c
      WHEN c = 6 THEN -0.000000000000000887620818362638 * c
      WHEN c = 7 THEN -0.000000000000000332648706384432 * c
      WHEN c = 8 THEN -0.00000000000000110994422395641 * c
      WHEN c = 9 THEN -0.00000000000000188723974152839 * c
      WHEN c = 10 THEN 0 * c
    END
  )

GLMs are also supported with log or identity link functions

glm_model <- glm(formula, data=df, family=Gamma(link="log"))
modelc(glm_model)
  EXP(
    0.557874070609732 + 0.244938197625494 * `a` + (
      CASE
        WHEN c = 2 THEN 0.394878990324516 * c
        WHEN c = 3 THEN 0.536977925025217 * c
        WHEN c = 4 THEN 0.570378881020516 * c
        WHEN c = 5 THEN 0.542936294999294 * c
        WHEN c = 6 THEN 0.476536561025273 * c
        WHEN c = 7 THEN 0.383038044594683 * c
        WHEN c = 8 THEN 0.269593156578649 * c
        WHEN c = 9 THEN 0.140849942185343 * c
        WHEN c = 10 THEN 0 * c
      END
    )
  )
glm_model_idlink <- glm(formula, data=df, family=Gamma(link="identity"))
modelc(glm_model_idlink)
  0.231808555545287 + 2 * `a` + (
    CASE
      WHEN c = 2 THEN 0.00000000000000139594865689472 * c
      WHEN c = 3 THEN -0.000000000000000581567338978993 * c
      WHEN c = 4 THEN -0.00000000000000111588502938831 * c
      WHEN c = 5 THEN 0.000000000000000967650035758108 * c
      WHEN c = 6 THEN -0.00000000000000149265067586469 * c
      WHEN c = 7 THEN -0.000000000000000100985345060517 * c
      WHEN c = 8 THEN -0.0000000000000000673235633736781 * c
      WHEN c = 9 THEN 0.00000000000000199047558220559 * c
      WHEN c = 10 THEN 0 * c
    END
  )

In order to avoid generating invalid SQL, modelc temporarily sets your scipen option to 999.

Installing

Using devtools:

install.packages("devtools")
install.packages("remotes")
remotes::install_github("sparkfish/modelc")

Precision

Note that you may encounter minor differences between the output of your R and generated SQL models depending on the precision with which your numeric types are represented in the database. To ensure parity between the two models, numeric types should have a precision of at least 17.

Tests

Tests are written using testthat. To run them, simply do

devtools::test()

Copy Link

Version

Install

install.packages('modelc')

Monthly Downloads

149

Version

1.0.0.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Hugo Saavedra

Last Published

June 28th, 2020

Functions in modelc (1.0.0.0)

modelc

Compile an R model to a valid TSQL formula
is_factor

Detect if the given model term is a factor
has_parameter

Check if an R model contains a coefficient
build_factor_case_statements

Build SQL CASE statements representing the factors in the model
get_factor_name

Extract the factor name from an R model
extract_parameters

Extract parameters from a linear model
is_intercept

Check if the given parameter is the intercept
is_interaction

Detect if the given model term is an interaction
build_interaction_term

Build a SQL interaction term
build_intercept

Get SQL representing the intercept term given the R model and parameter name
extract_level

Extract the level from the factor name
extract_parameter_coefficient

Extract the coefficient of a model parameter
build_product

Build a SQL product
apply_linkinverse

Wrap the model SQL in the appropriate link function inverse to return scaled predictions
build_additive_term

Get SQL representing a continuous term in the model with no interactions