cutdivides the range of
xinto intervals and codes the values in
xaccording to which interval they fall. The leftmost interval corresponds to level one, the next leftmost to level two and so on.
# S3 method for default cut(x, breaks, labels = NULL, include.lowest = FALSE, right = TRUE, dig.lab = 3, ordered_result = FALSE, …)
xis to be cut.
"(a,b]"interval notation. If
labels = FALSE, simple integer codes are returned instead of a factor.
right = FALSE) ‘breaks’ value should be included.
factoris returned, unless
labels = FALSEwhich results in an integer vector of level codes. Values which fall outside the range of
breaksare coded as
NA, as are
breaksis specified as a single number, the range of the data is divided into
breakspieces of equal length, and then the outer limits are moved away by 0.1% of the range to ensure that the extreme values both fall within the break intervals. (If
xis a constant vector, equal-length intervals are created, one of which includes the single value.) If a
labelsparameter is specified, its values are used to name the factor levels. If none is specified, the factor level labels are constructed as
"(b2, b3]"etc. for
right = TRUEand as
"[b1, b2)", … if
right = FALSE. In this case,
dig.labindicates the minimum number of digits should be used in formatting the numbers
b2, …. A larger value (up to 12) will be used if needed to distinguish between any pair of endpoints: if this fails labels such as
"Range3"will be used. Formatting is done by
formatC. The default method will sort a numeric vector of
breaks, but other methods are not required to and
labelswill correspond to the intervals after sorting. As from R 3.2.0,
getOption("OutDec")is consulted when labels are constructed for
labels = NULL.
splitfor splitting a variable according to a group factor;
quantilefor ways of choosing breaks of roughly equal content (rather than length).
.bincodefor a bare-bones version.
Z <- stats::rnorm(10000) table(cut(Z, breaks = -6:6)) sum(table(cut(Z, breaks = -6:6, labels = FALSE))) sum(graphics::hist(Z, breaks = -6:6, plot = FALSE)$counts) cut(rep(1,5), 4) #-- dummy tx0 <- c(9, 4, 6, 5, 3, 10, 5, 3, 5) x <- rep(0:8, tx0) stopifnot(table(x) == tx0) table( cut(x, b = 8)) table( cut(x, breaks = 3*(-2:5))) table( cut(x, breaks = 3*(-2:5), right = FALSE)) ##--- some values OUTSIDE the breaks : table(cx <- cut(x, breaks = 2*(0:4))) table(cxl <- cut(x, breaks = 2*(0:4), right = FALSE)) which(is.na(cx)); x[is.na(cx)] #-- the first 9 values 0 which(is.na(cxl)); x[is.na(cxl)] #-- the last 5 values 8 ## Label construction: y <- stats::rnorm(100) table(cut(y, breaks = pi/3*(-3:3))) table(cut(y, breaks = pi/3*(-3:3), dig.lab = 4)) table(cut(y, breaks = 1*(-3:3), dig.lab = 4)) # extra digits don't "harm" here table(cut(y, breaks = 1*(-3:3), right = FALSE)) #- the same, since no exact INT! ## sometimes the default dig.lab is not enough to be avoid confusion: aaa <- c(1,2,3,4,5,2,3,4,5,6,7) cut(aaa, 3) cut(aaa, 3, dig.lab = 4, ordered = TRUE) ## one way to extract the breakpoints labs <- levels(cut(aaa, 3)) cbind(lower = as.numeric( sub("\\((.+),.*", "\\1", labs) ), upper = as.numeric( sub("[^,]*,([^]]*)\\]", "\\1", labs) ))