# cut

0th

Percentile

##### Convert Numeric to Factor

cut divides the range of x into intervals and codes the values in x according to which interval they fall. The leftmost interval corresponds to level one, the next leftmost to level two and so on.

Keywords
category
##### Usage
cut(x, ...)
"cut"(x, breaks, labels = NULL, include.lowest = FALSE, right = TRUE, dig.lab = 3, ordered_result = FALSE, ...)
##### Arguments
x
a numeric vector which is to be converted to a factor by cutting.
breaks
either a numeric vector of two or more unique cut points or a single number (greater than or equal to 2) giving the number of intervals into which x is to be cut.
labels
labels for the levels of the resulting category. By default, labels are constructed using "(a,b]" interval notation. If labels = FALSE, simple integer codes are returned instead of a factor.
include.lowest
logical, indicating if an ‘x[i]’ equal to the lowest (or highest, for right = FALSE) ‘breaks’ value should be included.
right
logical, indicating if the intervals should be closed on the right (and open on the left) or vice versa.
dig.lab
integer which is used when labels are not given. It determines the number of digits used in formatting the break numbers.
ordered_result
logical: should the result be an ordered factor?
...
further arguments passed to or from other methods.
##### Details

When breaks is specified as a single number, the range of the data is divided into breaks pieces of equal length, and then the outer limits are moved away by 0.1% of the range to ensure that the extreme values both fall within the break intervals. (If x is a constant vector, equal-length intervals are created, one of which includes the single value.)

If a labels parameter is specified, its values are used to name the factor levels. If none is specified, the factor level labels are constructed as "(b1, b2]", "(b2, b3]" etc. for right = TRUE and as "[b1, b2)", ... if right = FALSE. In this case, dig.lab indicates the minimum number of digits should be used in formatting the numbers b1, b2, .... A larger value (up to 12) will be used if needed to distinguish between any pair of endpoints: if this fails labels such as "Range3" will be used. Formatting is done by formatC.

The default method will sort a numeric vector of breaks, but other methods are not required to and labels will correspond to the intervals after sorting. As from R 3.2.0, getOption("OutDec") is consulted when labels are constructed for labels = NULL.

##### Value

A factor is returned, unless labels = FALSE which results in an integer vector of level codes.Values which fall outside the range of breaks are coded as NA, as are NaN and NA values.

##### Note

Instead of table(cut(x, br)), hist(x, br, plot = FALSE) is more efficient and less memory hungry. Instead of cut(*, labels = FALSE), findInterval() is more efficient.

##### References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

split for splitting a variable according to a group factor; factor, tabulate, table, findInterval.

quantile for ways of choosing breaks of roughly equal content (rather than length).

.bincode for a bare-bones version.

• cut
• cut.default
##### Examples
library(base) Z <- stats::rnorm(10000) table(cut(Z, breaks = -6:6)) sum(table(cut(Z, breaks = -6:6, labels = FALSE))) sum(graphics::hist(Z, breaks = -6:6, plot = FALSE)\$counts) cut(rep(1,5), 4) #-- dummy tx0 <- c(9, 4, 6, 5, 3, 10, 5, 3, 5) x <- rep(0:8, tx0) stopifnot(table(x) == tx0) table( cut(x, b = 8)) table( cut(x, breaks = 3*(-2:5))) table( cut(x, breaks = 3*(-2:5), right = FALSE)) ##--- some values OUTSIDE the breaks : table(cx <- cut(x, breaks = 2*(0:4))) table(cxl <- cut(x, breaks = 2*(0:4), right = FALSE)) which(is.na(cx)); x[is.na(cx)] #-- the first 9 values 0 which(is.na(cxl)); x[is.na(cxl)] #-- the last 5 values 8 ## Label construction: y <- stats::rnorm(100) table(cut(y, breaks = pi/3*(-3:3))) table(cut(y, breaks = pi/3*(-3:3), dig.lab = 4)) table(cut(y, breaks = 1*(-3:3), dig.lab = 4)) # extra digits don't "harm" here table(cut(y, breaks = 1*(-3:3), right = FALSE)) #- the same, since no exact INT! ## sometimes the default dig.lab is not enough to be avoid confusion: aaa <- c(1,2,3,4,5,2,3,4,5,6,7) cut(aaa, 3) cut(aaa, 3, dig.lab = 4, ordered = TRUE) ## one way to extract the breakpoints labs <- levels(cut(aaa, 3)) cbind(lower = as.numeric( sub("\\((.+),.*", "\\1", labs) ), upper = as.numeric( sub("[^,]*,([^]]*)\\]", "\\1", labs) )) 
Documentation reproduced from package base, version 3.3, License: Part of R @VERSION@

### Community examples

mark@niemannross.com at Feb 6, 2019 base v3.5.2

[Example file for linkedin learning](https://linkedin-learning.pxf.io/rweekly_cut) r # Description: cut to set intervals numericVector <- runif(100, min = 1, max = 256 ) cut(numericVector, 3) cut(numericVector, 3, labels = c("low","med","high")) cut(numericVector, 3, labels = FALSE) cut(numericVector,breaks = c(1,100,200,256)) 

vezy.remi@gmail.com at Oct 14, 2016 base v3.3.1

## Cut with custom labels Cut specifies labels formated with [formatC](https://www.rdocumentation.org/packages/base/versions/3.3.1/topics/formatC?) (eg. "[b1, b2)" ). It is not always convenient, so you can add the labels argument to give your own levels. Unfortunately, no exemples are provided in the base documentation. As Josh O'Brien says in his [answer](http://stackoverflow.com/a/13061832/6947799) on stackoverflow, 11 breaks delimit 10 levels which will require only 10 labels. Setting our own levels using the base exemple Z variable, with three cuts: - the minimum - the mean - the maximum The variable will be cut in two levels: - any value below or equal to the mean - any value above the mean See interactive R block: r Z <- stats::rnorm(10000) a= cut(Z, breaks = c(min(Z), mean(Z), max(Z)), labels= c("Mean_or_Below", "Above")) print(head(a))  we made a new factor a that is easier to use afterwards.