cut
Convert Numeric to Factor
cut
divides the range of x
into intervals
and codes the values in x
according to which
interval they fall. The leftmost interval corresponds to level one,
the next leftmost to level two and so on.
 Keywords
 category
Usage
cut(x, ...)
"cut"(x, breaks, labels = NULL, include.lowest = FALSE, right = TRUE, dig.lab = 3, ordered_result = FALSE, ...)
Arguments
 x
 a numeric vector which is to be converted to a factor by cutting.
 breaks
 either a numeric vector of two or more unique cut points or a
single number (greater than or equal to 2) giving the number of
intervals into which
x
is to be cut.  labels
 labels for the levels of the resulting category. By default,
labels are constructed using
"(a,b]"
interval notation. Iflabels = FALSE
, simple integer codes are returned instead of a factor.  include.lowest
 logical, indicating if an ‘x[i]’ equal to
the lowest (or highest, for
right = FALSE
) ‘breaks’ value should be included.  right
 logical, indicating if the intervals should be closed on the right (and open on the left) or vice versa.
 dig.lab
 integer which is used when labels are not given. It determines the number of digits used in formatting the break numbers.
 ordered_result
 logical: should the result be an ordered factor?
 ...
 further arguments passed to or from other methods.
Details
When breaks
is specified as a single number, the range of the
data is divided into breaks
pieces of equal length, and then
the outer limits are moved away by 0.1% of the range to ensure that
the extreme values both fall within the break intervals. (If x
is a constant vector, equallength intervals are created, one of
which includes the single value.)
If a labels
parameter is specified, its values are used to name
the factor levels. If none is specified, the factor level labels are
constructed as "(b1, b2]"
, "(b2, b3]"
etc. for
right = TRUE
and as "[b1, b2)"
, ... if right =
FALSE
.
In this case, dig.lab
indicates the minimum number of digits
should be used in formatting the numbers b1
, b2
, ....
A larger value (up to 12) will be used if needed to distinguish
between any pair of endpoints: if this fails labels such as
"Range3"
will be used. Formatting is done by
formatC
.
The default method will sort a numeric vector of breaks
, but
other methods are not required to and labels
will correspond to
the intervals after sorting.
As from R 3.2.0, getOption("OutDec")
is consulted when labels
are constructed for labels = NULL
.
Value

A
factor
is returned, unless labels = FALSE
which
results in an integer vector of level codes.Values which fall outside the range of breaks
are coded as
NA
, as are NaN
and NA
values.
Note
Instead of table(cut(x, br))
, hist(x, br, plot = FALSE)
is
more efficient and less memory hungry. Instead of cut(*,
labels = FALSE)
, findInterval()
is more efficient.
References
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
See Also
split
for splitting a variable according to a group factor;
factor
, tabulate
, table
,
findInterval
.
quantile
for ways of choosing breaks of roughly equal
content (rather than length).
.bincode
for a barebones version.
Examples
library(base)
Z < stats::rnorm(10000)
table(cut(Z, breaks = 6:6))
sum(table(cut(Z, breaks = 6:6, labels = FALSE)))
sum(graphics::hist(Z, breaks = 6:6, plot = FALSE)$counts)
cut(rep(1,5), 4) # dummy
tx0 < c(9, 4, 6, 5, 3, 10, 5, 3, 5)
x < rep(0:8, tx0)
stopifnot(table(x) == tx0)
table( cut(x, b = 8))
table( cut(x, breaks = 3*(2:5)))
table( cut(x, breaks = 3*(2:5), right = FALSE))
## some values OUTSIDE the breaks :
table(cx < cut(x, breaks = 2*(0:4)))
table(cxl < cut(x, breaks = 2*(0:4), right = FALSE))
which(is.na(cx)); x[is.na(cx)] # the first 9 values 0
which(is.na(cxl)); x[is.na(cxl)] # the last 5 values 8
## Label construction:
y < stats::rnorm(100)
table(cut(y, breaks = pi/3*(3:3)))
table(cut(y, breaks = pi/3*(3:3), dig.lab = 4))
table(cut(y, breaks = 1*(3:3), dig.lab = 4))
# extra digits don't "harm" here
table(cut(y, breaks = 1*(3:3), right = FALSE))
# the same, since no exact INT!
## sometimes the default dig.lab is not enough to be avoid confusion:
aaa < c(1,2,3,4,5,2,3,4,5,6,7)
cut(aaa, 3)
cut(aaa, 3, dig.lab = 4, ordered = TRUE)
## one way to extract the breakpoints
labs < levels(cut(aaa, 3))
cbind(lower = as.numeric( sub("\\((.+),.*", "\\1", labs) ),
upper = as.numeric( sub("[^,]*,([^]]*)\\]", "\\1", labs) ))
Community examples
[Example file for linkedin learning](https://linkedinlearning.pxf.io/rweekly_cut) ```r # Description: cut to set intervals numericVector < runif(100, min = 1, max = 256 ) cut(numericVector, 3) cut(numericVector, 3, labels = c("low","med","high")) cut(numericVector, 3, labels = FALSE) cut(numericVector,breaks = c(1,100,200,256)) ```
## Cut with custom labels Cut specifies ```labels``` formated with [formatC](https://www.rdocumentation.org/packages/base/versions/3.3.1/topics/formatC?) (eg. "[b1, b2)" ). It is not always convenient, so you can add the ```labels``` argument to give your own levels. Unfortunately, no exemples are provided in the base documentation. As Josh O'Brien says in his [answer](http://stackoverflow.com/a/13061832/6947799) on stackoverflow, 11 ```breaks``` delimit 10 levels which will require only 10 ```labels```. Setting our own levels using the base exemple Z variable, with three cuts:  the minimum  the mean  the maximum The variable will be cut in two levels:  any value below or equal to the mean  any value above the mean See interactive R block: ```r Z < stats::rnorm(10000) a= cut(Z, breaks = c(min(Z), mean(Z), max(Z)), labels= c("Mean_or_Below", "Above")) print(head(a)) ``` we made a new factor ```a``` that is easier to use afterwards.