Learn R Programming

⚠️There's a newer version (0.3.5) of this package.Take me there.

butcher

Overview

Modeling or machine learning in R can result in fitted model objects that take up too much memory. There are two main culprits:

  1. Heavy usage of formulas and closures that capture the enclosing environment in model training
  2. Lack of selectivity in the construction of the model object itself

As a result, fitted model objects contain components that are often redundant and not required for post-fit estimation activities. The butcher package provides tooling to “axe” parts of the fitted output that are no longer needed, without sacrificing prediction functionality from the original model object.

Installation

Install the released version from CRAN:

install.packages("butcher")

Or install the development version from GitHub:

# install.packages("pak")
pak::pak("tidymodels/butcher")

Butchering

As an example, let’s wrap an lm model so it contains a lot of unnecessary stuff:

library(butcher)
our_model <- function() {
  some_junk_in_the_environment <- runif(1e6) # we didn't know about
  lm(mpg ~ ., data = mtcars) 
}

This object is unnecessarily large:

library(lobstr)
obj_size(our_model())
#> 8.02 MB

When, in fact, it should only be:

small_lm <- lm(mpg ~ ., data = mtcars) 
obj_size(small_lm)
#> 22.22 kB

To understand which part of our original model object is taking up the most memory, we leverage the weigh() function:

big_lm <- our_model()
weigh(big_lm)
#> # A tibble: 25 × 2
#>    object            size
#>    <chr>            <dbl>
#>  1 terms         8.05    
#>  2 qr.qr         0.00666 
#>  3 residuals     0.00286 
#>  4 fitted.values 0.00286 
#>  5 effects       0.0014  
#>  6 coefficients  0.00109 
#>  7 call          0.000728
#>  8 model.mpg     0.000304
#>  9 model.cyl     0.000304
#> 10 model.disp    0.000304
#> # ℹ 15 more rows

The problem here is in the terms component of our big_lm. Because of how lm() is implemented in the stats package, the environment in which our model was made is carried along in the fitted output. To remove the (mostly) extraneous component, we can use butcher():

cleaned_lm <- butcher(big_lm, verbose = TRUE)
#> ✔ Memory released: 8.03 MB
#> ✖ Disabled: `print()`, `summary()`, and `fitted()`

Comparing it against our small_lm, we find:

weigh(cleaned_lm)
#> # A tibble: 25 × 2
#>    object           size
#>    <chr>           <dbl>
#>  1 terms        0.00771 
#>  2 qr.qr        0.00666 
#>  3 residuals    0.00286 
#>  4 effects      0.0014  
#>  5 coefficients 0.00109 
#>  6 model.mpg    0.000304
#>  7 model.cyl    0.000304
#>  8 model.disp   0.000304
#>  9 model.hp     0.000304
#> 10 model.drat   0.000304
#> # ℹ 15 more rows

And now it will take up about the same memory on disk as small_lm:

weigh(small_lm)
#> # A tibble: 25 × 2
#>    object            size
#>    <chr>            <dbl>
#>  1 terms         8.06    
#>  2 qr.qr         0.00666 
#>  3 residuals     0.00286 
#>  4 fitted.values 0.00286 
#>  5 effects       0.0014  
#>  6 coefficients  0.00109 
#>  7 call          0.000728
#>  8 model.mpg     0.000304
#>  9 model.cyl     0.000304
#> 10 model.disp    0.000304
#> # ℹ 15 more rows

To make the most of your memory available, this package provides five S3 generics for you to remove parts of a model object:

  • axe_call(): To remove the call object.
  • axe_ctrl(): To remove controls associated with training.
  • axe_data(): To remove the original training data.
  • axe_env(): To remove environments.
  • axe_fitted(): To remove fitted values.

When you run butcher(), you execute all of these axing functions at once. Any kind of axing on the object will append a butchered class to the current model object class(es) as well as a new attribute named butcher_disabled that lists any post-fit estimation functions that are disabled as a result.

Model Object Coverage

Check out the vignette("available-axe-methods") to see butcher’s current coverage. If you are working with a new model object that could benefit from any kind of axing, we would love for you to make a pull request! You can visit the vignette("adding-models-to-butcher") for more guidelines, but in short, to contribute a set of axe methods:

  1. Run new_model_butcher(model_class = "your_object", package_name = "your_package")
  2. Use butcher helper functions weigh() and locate() to decide what to axe
  3. Finalize edits to R/your_object.R and tests/testthat/test-your_object.R
  4. Make a pull request!

Contributing

This project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('butcher')

Monthly Downloads

6,994

Version

0.3.4

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Julia Silge

Last Published

April 11th, 2024

Functions in butcher (0.3.4)

axe-mda

Axing a mda.
axe-multnet

Axing an multnet.
axe-nested_model_fit

Axing a nested_model_fit.
axe-rpart

Axing a rpart.
axe-sclass

Axing a sclass object.
axe-train

Axing a train object.
axe-train.recipe

Axing a train.recipe object.
axe-model_fit

Axing an model_fit.
axe-nnet

Axing a nnet.
axe-xrf

Axing a xrf.
axe-pls

Axing mixOmics models
axe_call

Axe a call.
axe_ctrl

Axe controls.
axe-xgb.Booster

Axing a xgb.Booster.
axe-randomForest

Axing an randomForest.
axe_data

Axe data.
axe-ranger

Axing an ranger.
axe_fitted

Axe fitted values.
axe_env

Axe an environment.
axe-terms

Axing for terms inputs.
new_model_butcher

New axe functions for a modeling object.
axe-kknn

Axing an kknn.
ui

Console Messages
axe-ipred

Axing a bagged tree.
locate

Locate part of an object.
axe-survreg.penal

Axing an survreg.penal
axe-rda

Axing an rda.
weigh

Weigh the object.
butcher-package

Reduce the Size of Modeling Objects
axe-spark

Axing a spark object.
axe-recipe

Axing a recipe object.
axe-survreg

Axing an survreg.
butcher

Butcher an object.
butcher_example

Get path to model object example.
axe-C5.0

Axing a C5.0.
axe-KMeansCluster

Axing a KMeansCluster.
axe-coxph

Axing a coxph.
axe-formula

Axing formulas.
axe-earth

Axing an earth object.
axe-elnet

Axing an elnet.
axe-NaiveBayes

Axing a NaiveBayes.
axe-flexsurvreg

Axing an flexsurvreg.
axe-function

Axing functions.
axe-bart

Axing a bart model.
axe-mass

Axing a MASS discriminant analysis object.
axe-gausspr

Axing a gausspr.
axe-gam

Axing a gam.
axe-lm

Axing an lm.
axe-glm

Axing a glm.
axe-kproto

Axing a kproto.
axe-glmnet

Axing a glmnet.
axe-ksvm

Axing a ksvm object.