# findInterval

0th

Percentile

##### Find Interval Numbers or Indices

Given a vector of non-decreasing breakpoints in vec, find the interval containing each element of x; i.e., if i <- findInterval(x,v), for each index j in x $v_{i_j} \le x_j < v_{i_j + 1}$ where $v_0 := -\infty$, $v_{N+1} := +\infty$, and N <- length(v). At the two boundaries, the returned index may differ by 1, depending on the optional arguments rightmost.closed and all.inside.

Keywords
utilities, arith
##### Usage
findInterval(x, vec, rightmost.closed = FALSE, all.inside = FALSE,
left.open = FALSE)
##### Arguments
x

numeric.

vec

numeric, sorted (weakly) increasingly, of length N, say.

rightmost.closed

logical; if true, the rightmost interval, vec[N-1] .. vec[N] is treated as closed, see below.

all.inside

logical; if true, the returned indices are coerced into 1,…,N-1, i.e., 0 is mapped to 1 and N to N-1.

left.open

logical; if true all the intervals are open at left and closed at right; in the formulas below, $\le$ should be swapped with $<$ (and $>$ with $\ge$), and rightmost.closed means ‘leftmost is closed’. This may be useful, e.g., in survival analysis computations.

##### Details

The function findInterval finds the index of one vector x in another, vec, where the latter must be non-decreasing. Where this is trivial, equivalent to apply( outer(x, vec, ">="), 1, sum), as a matter of fact, the internal algorithm uses interval search ensuring $O(n \log N)$ complexity where n <- length(x) (and N <- length(vec)). For (almost) sorted x, it will be even faster, basically $O(n)$.

This is the same computation as for the empirical distribution function, and indeed, findInterval(t, sort(X)) is identical to $n F_n(t; X_1,\dots,X_n)$ where $F_n$ is the empirical distribution function of $X_1,\dots,X_n$.

When rightmost.closed = TRUE, the result for x[j] = vec[N] ($= \max vec$), is N - 1 as for all other values in the last interval.

left.open = TRUE is occasionally useful, e.g., for survival data. For (anti-)symmetry reasons, it is equivalent to using “mirrored” data, i.e., the following is always true:

    identical(
findInterval( x,  v,      left.open= TRUE, ...) ,
N - findInterval(-x, -v[N:1], left.open=FALSE, ...) )


where N <- length(vec) as above.

##### Value

vector of length length(x) with values in 0:N (and NA) where N <- length(vec), or values coerced to 1:(N-1) if and only if all.inside = TRUE (equivalently coercing all x values inside the intervals). Note that NAs are propagated from x, and Inf values are allowed in both x and vec.

approx(*, method = "constant") which is a generalization of findInterval(), ecdf for computing the empirical distribution function which is (up to a factor of $n$) also basically the same as findInterval(.).
library(base) # NOT RUN { x <- 2:18 v <- c(5, 10, 15) # create two bins [5,10) and [10,15) cbind(x, findInterval(x, v)) N <- 100 X <- sort(round(stats::rt(N, df = 2), 2)) tt <- c(-100, seq(-2, 2, len = 201), +100) it <- findInterval(tt, X) tt[it < 1 | it >= N] # only first and last are outside range(X) ## 'left.open = TRUE' means "mirroring" : N <- length(v) stopifnot(identical( findInterval( x, v, left.open=TRUE) , N - findInterval(-x, -v[N:1]))) # }