Learn R Programming

twidlr: consistent data.frame and formula API for models

Overview

twidlr is an R package that exposes a consistent API for model functions and their corresponding predict methods such that they are specified as:

fit <- model(data, formula, ...)
predict(fit, data, ...)

Where "data" is a required data.frame (or able to be coerced to one) and "formula" is a formula (or string able to be coerced to one) that describes the model to be fitted.

twidlr gets its name from the "twiddle" used in R formulas.

Installation

twidlr is available to install from github by running:

# install.packages("devtools")
devtools::install_github("drsimonj/twidlr")

Usage

library(twidlr) exposes model functions that you're already familiar with, but such that they accept a data.frame first, formula second, and then additional arguments. A robust method to predict data is also exposed.

For example, a typical linear model would be lm(hp ~ mpg * wt, mtcars, ...). Once twidlr is loaded, the same model would be run via lm(mtcars, hp ~ mpg * wt, ...).

Motivation

Modelling in R is messy! Some models take formulas and data frames while others require matrices and vectors. The same can be said of corresponding predict() methods, which can also be impure, returning unexpected or inconsistent results.

twidlr seeks to overcome these problems be providing:

  • Consistent API for model functions and their corresponding predict methods (helping to improve the generality of tidy modelling packages like piplearner)
  • Pure and available predictions by way of predict being made available for all methods (including unsupervised algorithms like kmeans) and making "data" a required argument
  • Tidyverse philosophy by working with data frames and being pipeable such as mtcars %>% lm(hp ~ wt)
  • Leverage formula operators where they may be valid but not originally available. For example, to specify select variables or include additional terms like interactions and dummy-coded variables with syntax such as glmnet(iris, Sepal.Width ~ Petal.Width * Petal.Length + Species). Formulas created as strings can always be used too!

twidlr models

Model functions exposed by twidlr:

PackageFunctions
e1071naiveBayes, svm
gamlssgamlss
glmnetcv.glmnet, glmnet
lme4glmer, lmer
quantregcrq, nlrq, rq, rqss
randomForestrandomForest
rpartrpart
statsaov, factanal, glm, kmeans, lm, prcomp, t.test (now 'ttest')
xgboostxgboost

Contributing

For conventions and best-practices when contributing to twidlr, please see CONTRIBUTING.md

Copy Link

Version

Version

0.0.0.9000

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

May 22nd, 2017

Functions in twidlr (0.0.0.9000)

lm

data.frame-first formula-second method for
lmer

data.frame-first formula-second method for
glm

data.frame-first formula-second method for
glmer

data.frame-first formula-second method for
glmnet

data.frame-first formula-second method for
kmeans

data.frame-first formula-second method for
crq

data.frame-first formula-second method for
%>%

Pipe operator
prcomp

data.frame-first formula-second method for
xgboost

data.frame-first formula-second method for
rpart

data.frame-first formula-second method for
rq

data.frame-first formula-second method for
check_pkg

Check for required package
predict_checks

Run checks for twidlr predict functions and invisibly return 'data' coerced
randomForest

data.frame-first formula-second method for
model_as_xy

Convert data frame and model
nlrq

data.frame-first formula-second method for
aov

data.frame-first formula-second method for
check_alt_args

Check if argument(s) are given as alternatives to another
t.test

data.frame-first formula-second method for
twidlr_defaults

Default parameters used by twidlr functions
cv.glmnet

data.frame-first formula-second method for
ttest

data.frame-first formula-second method for
factanal

data.frame-first formula-second method for
unsupervised_twidlr_defaults

Default parameters used by twidlr functions that do not specify an outcome
coerce_args

Coerce arguments to right object class
svm

data.frame-first formula-second method for
naiveBayes

data.frame-first formula-second method for
gamlss

data.frame-first formula-second method for