Learn R Programming

binr (version 1.1)

bins.greedy: Greedy binning algorithm.

Description

bins.greedy - Wrapper around bins.greedy.impl. Goes over the sorted values of x left to right and fills the bins with the values until they are about the right size.

bins.greedy.impl - Implementation of a single-pass binning algorithm that examines sorted data left to right and builds bins of the target size. The bins.greedy wrapper around this function provides a less involved interface. This is not symmetric wrt direction: symmetric distributions may not have symmetric bins if there are multiple points with the same values. If a single value accounts for more than thresh * binsz points, it will be placed in a new bin.

Usage

bins.greedy(x, nbins, minpts = floor(0.5 * length(x)/nbins), thresh = 0.8, naive = FALSE)
bins.greedy.impl(xval, xtbl, xstp, binsz, nbins, thresh, verbose = F)

Arguments

x
Vector of numbers.
nbins
Target number of bins.
minpts
Minimum number of points in a bin. Only used if naive = FALSE.
thresh
Threshold fraction of bin size for the greedy algorithm. Suppose there's n < binsz points in the current bin already. Also suppose that the next value V is represented by m points, and m + n > binsz. Then the algorithm will check if m > thresh * binsz, and if so, will place the value V into a new bin. If m is below the threshold, the points having value V are added to the current bin.
naive
When TRUE, simply calls bins.greedy.impl with data derived from x. Otherwise, makes an extra step of marking the values that by themselves take a whole bin to force the algorithm to place these values in a bin separately.
xval
Sorted unique values of the data set x. This should be the numeric version of names(xtbl).
xtbl
Result of a call to table(x).
xstp
Stopping points; if xstp[i] == TRUE, the i-th value can't be merged to the (i-1)-th one. xstp[1] value is ignored.
binsz
Target bin size, i.e., the number of points falling into each bin; for example, floor(length(x) / nbins)
verbose
When TRUE, prints the number of points falling into the bins.

Value

A list with the following items:
  • binlo - The "low" value falling into the bin.
  • binhi - The "high" value falling into the bin.
  • binct - The number of points falling into the bin.
  • xtbl - The result of a call to table(x).
  • xval - The sorted unique values of the data points x. Essentially, a numeric version of names(xtbl).

See Also

binr, bins, bins.quantiles bins.optimize