Learn R Programming

santoku

santoku is a versatile cutting tool for R. It provides chop(), a replacement for base::cut().

Installation

Install from r-universe:

install.packages("santoku", repos = c("https://hughjonesd.r-universe.dev", 
                                      "https://cloud.r-project.org"))

Or from CRAN:

install.packages("santoku")

Or get the development version from github:

# install.packages("remotes")
remotes::install_github("hughjonesd/santoku")

Advantages

Here are some advantages of santoku:

  • By default, chop() always covers the whole range of the data, so you won’t get unexpected NA values.

  • chop() can handle single values as well as intervals. For example, chop(x, breaks = c(1, 2, 2, 3)) will create a separate factor level for values exactly equal to 2.

  • chop() can handle many kinds of data, including numbers, dates and times, and units.

  • chop_* functions create intervals in many ways, using quantiles of the data, standard deviations, fixed-width intervals, equal-sized groups, or pretty intervals for use in graphs.

  • It’s easy to label intervals: use names for your breaks vector, or use a lbl_* function to create interval notation like [1, 2), dash notation like 1-2, or arbitrary styles using glue::glue().

  • tab_* functions quickly chop data, then tabulate it.

These advantages make santoku especially useful for exploratory analysis, where you may not know the range of your data in advance.

Examples

library(santoku)

chop returns a factor:

chop(1:5, c(2, 4))
#> [1] [1, 2) [2, 4) [2, 4) [4, 5] [4, 5]
#> Levels: [1, 2) [2, 4) [4, 5]

Include a number twice to match it exactly:

chop(1:5, c(2, 2, 4))
#> [1] [1, 2) {2}    (2, 4) [4, 5] [4, 5]
#> Levels: [1, 2) {2} (2, 4) [4, 5]

Use names in breaks for labels:

chop(1:5, c(Low = 1, Mid = 2, High = 4))
#> [1] Low  Mid  Mid  High High
#> Levels: Low Mid High

Or use lbl_* functions:

chop(1:5, c(2, 4), labels = lbl_dash())
#> [1] 1—2 2—4 2—4 4—5 4—5
#> Levels: 1—2 2—4 4—5

Chop into fixed-width intervals:

chop_width(runif(10), 0.1)
#>  [1] [0.1399, 0.2399) [0.5399, 0.6399) [0.5399, 0.6399) [0.5399, 0.6399)
#>  [5] [0.6399, 0.7399) [0.3399, 0.4399) [0.8399, 0.9399] [0.8399, 0.9399]
#>  [9] [0.5399, 0.6399) [0.1399, 0.2399)
#> 5 Levels: [0.1399, 0.2399) [0.3399, 0.4399) ... [0.8399, 0.9399]

Or into fixed-size groups:

chop_n(1:10, 5)
#>  [1] [1, 6)  [1, 6)  [1, 6)  [1, 6)  [1, 6)  [6, 10] [6, 10] [6, 10] [6, 10]
#> [10] [6, 10]
#> Levels: [1, 6) [6, 10]

Chop dates by calendar month, then tabulate:

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

dates <- as.Date("2021-12-31") + 1:90

tab_width(dates, months(1), labels = lbl_discrete(fmt = "%d %b"))
#> 01 Jan—31 Jan 01 Feb—28 Feb 01 Mar—31 Mar 
#>            31            28            31

For more information, see the vignette.

Copy Link

Version

Install

install.packages('santoku')

Monthly Downloads

1,063

Version

0.10.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

David Hugh-Jones

Last Published

October 12th, 2023

Functions in santoku (0.10.0)

lbl_discrete

Label discrete data
non-standard-types

Tips for chopping non-standard types
percent

Simple percentage formatter
lbl_seq

Label chopped intervals in sequence
lbl_midpoints

Label chopped intervals by their midpoints
santoku-package

A versatile cutting tool for R
lbl_manual

Label chopped intervals in a user-defined sequence
santoku-cast

Internal functions
lbl_intervals

Label chopped intervals using set notation
chop_equally

Chop equal-sized groups
chop_n

Chop into fixed-sized groups
chop_mean_sd

Chop by standard deviations
brk_width-for-datetime

Equal-width intervals for dates or datetimes
chop_evenly

Chop into equal-width intervals
chop

Cut data into intervals
breaks-class

Class representing a set of intervals
chop_fn

Chop using an existing function
brk_manual

Create a breaks object manually
fillet

Chop data precisely (for programmers)
chop_width

Chop into fixed-width intervals
brk_default

Create a standard set of breaks
chop_quantiles

Chop by quantiles
chop_pretty

Chop using pretty breakpoints
exactly

Define singleton intervals explicitly
lbl_endpoints

Label chopped intervals by their left or right endpoints
chop_proportions

Chop into proportions of the range of x
lbl_glue

Label chopped intervals using the glue package
lbl_dash

Label chopped intervals like 1-4, 4-5, ...