Learn R Programming

butcher

Overview

Modeling or machine learning in R can result in fitted model objects that take up too much memory. There are two main culprits:

  1. Heavy usage of formulas and closures that capture the enclosing environment in model training
  2. Lack of selectivity in the construction of the model object itself

As a result, fitted model objects contain components that are often redundant and not required for post-fit estimation activities. The butcher package provides tooling to “axe” parts of the fitted output that are no longer needed, without sacrificing prediction functionality from the original model object.

Installation

Install the released version from CRAN:

install.packages("butcher")

Or install the development version from GitHub:

# install.packages("pak")
pak::pak("tidymodels/butcher")

Butchering

As an example, let’s wrap an lm model so it contains a lot of unnecessary stuff:

library(butcher)
our_model <- function() {
  some_junk_in_the_environment <- runif(1e6) # we didn't know about
  lm(mpg ~ ., data = mtcars)
}

This object is unnecessarily large:

library(lobstr)
obj_size(our_model())
#> 8.02 MB

When, in fact, it should only be:

small_lm <- lm(mpg ~ ., data = mtcars)
obj_size(small_lm)
#> 22.22 kB

To understand which part of our original model object is taking up the most memory, we leverage the weigh() function:

big_lm <- our_model()
weigh(big_lm)
#> # A tibble: 25 × 2
#>    object            size
#>    <chr>            <dbl>
#>  1 terms         8.01    
#>  2 qr.qr         0.00666 
#>  3 residuals     0.00286 
#>  4 fitted.values 0.00286 
#>  5 effects       0.0014  
#>  6 coefficients  0.00109 
#>  7 call          0.000728
#>  8 model.mpg     0.000304
#>  9 model.cyl     0.000304
#> 10 model.disp    0.000304
#> # ℹ 15 more rows

The problem here is in the terms component of our big_lm. Because of how lm() is implemented in the stats package, the environment in which our model was made is carried along in the fitted output. To remove the (mostly) extraneous component, we can use butcher():

cleaned_lm <- butcher(big_lm, verbose = TRUE)
#> ✔ Memory released: 8.00 MB
#> ✖ Disabled: `print()`, `summary()`, and `fitted()`

Comparing it against our small_lm, we find:

weigh(cleaned_lm)
#> # A tibble: 25 × 2
#>    object           size
#>    <chr>           <dbl>
#>  1 terms        0.00771 
#>  2 qr.qr        0.00666 
#>  3 residuals    0.00286 
#>  4 effects      0.0014  
#>  5 coefficients 0.00109 
#>  6 model.mpg    0.000304
#>  7 model.cyl    0.000304
#>  8 model.disp   0.000304
#>  9 model.hp     0.000304
#> 10 model.drat   0.000304
#> # ℹ 15 more rows

And now it will take up about the same memory on disk as small_lm:

weigh(small_lm)
#> # A tibble: 25 × 2
#>    object            size
#>    <chr>            <dbl>
#>  1 terms         0.00763 
#>  2 qr.qr         0.00666 
#>  3 residuals     0.00286 
#>  4 fitted.values 0.00286 
#>  5 effects       0.0014  
#>  6 coefficients  0.00109 
#>  7 call          0.000728
#>  8 model.mpg     0.000304
#>  9 model.cyl     0.000304
#> 10 model.disp    0.000304
#> # ℹ 15 more rows

To make the most of your memory available, this package provides five S3 generics for you to remove parts of a model object:

  • axe_call(): To remove the call object.
  • axe_ctrl(): To remove controls associated with training.
  • axe_data(): To remove the original training data.
  • axe_env(): To remove environments.
  • axe_fitted(): To remove fitted values.

When you run butcher(), you execute all of these axing functions at once. Any kind of axing on the object will append a butchered class to the current model object class(es) as well as a new attribute named butcher_disabled that lists any post-fit estimation functions that are disabled as a result.

Model Object Coverage

Check out the vignette("available-axe-methods") to see butcher’s current coverage. If you are working with a new model object that could benefit from any kind of axing, we would love for you to make a pull request! You can visit the vignette("adding-models-to-butcher") for more guidelines, but in short, to contribute a set of axe methods:

  1. Run new_model_butcher(model_class = "your_object", package_name = "your_package")
  2. Use butcher helper functions weigh() and locate() to decide what to axe
  3. Finalize edits to R/your_object.R and tests/testthat/test-your_object.R
  4. Make a pull request!

Contributing

This project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('butcher')

Monthly Downloads

6,344

Version

0.3.5

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Julia Silge

Last Published

March 18th, 2025

Functions in butcher (0.3.5)

axe-ipred

Axing a bagged tree.
axe-kproto

Axing a kproto.
axe-lm

Axing an lm.
axe-glm

Axing a glm.
axe-gam

Axing a gam.
axe-glmnet

Axing a glmnet.
axe-kknn

Axing an kknn.
axe-gausspr

Axing a gausspr.
axe-mass

Axing a MASS discriminant analysis object.
axe-nnet

Axing a nnet.
axe-mda

Axing a mda.
axe-ksvm

Axing a ksvm object.
axe-multnet

Axing an multnet.
axe-rpart

Axing a rpart.
axe-recipe

Axing a recipe object.
axe-pls

Axing mixOmics models
axe-randomForest

Axing an randomForest.
axe-rda

Axing an rda.
axe-xrf

Axing a xrf.
axe-sclass

Axing a sclass object.
axe-ranger

Axing an ranger.
axe-model_fit

Axing an model_fit.
axe-survreg

Axing an survreg.
axe-train.recipe

Axing a train.recipe object.
axe-spark

Axing a spark object.
new_model_butcher

New axe functions for a modeling object.
ui

Console Messages
axe-survreg.penal

Axing an survreg.penal
butcher

Butcher an object.
axe-terms

Axing for terms inputs.
axe-train

Axing a train object.
butcher-package

Reduce the Size of Modeling Objects
axe-xgb.Booster

Axing a xgb.Booster.
axe_ctrl

Axe controls.
axe_env

Axe an environment.
axe_data

Axe data.
butcher_example

Get path to model object example.
locate

Locate part of an object.
axe_call

Axe a call.
weigh

Weigh the object.
axe_fitted

Axe fitted values.
axe-flexsurvreg

Axing an flexsurvreg.
axe-formula

Axing formulas.
axe-function

Axing functions.
axe-C5.0

Axing a C5.0.
axe-coxph

Axing a coxph.
axe-elnet

Axing an elnet.
axe-KMeansCluster

Axing a KMeansCluster.
axe-NaiveBayes

Axing a NaiveBayes.
axe-bart

Axing a bart model.
axe-earth

Axing an earth object.