# findInterval

##### Find Interval Numbers or Indices

Given a vector of non-decreasing breakpoints in `vec`

, find the
interval containing each element of `x`

; i.e., if
`i <- findInterval(x,v)`

, for each index `j`

in `x`

\(v_{i_j} \le x_j < v_{i_j + 1}\)
where \(v_0 := -\infty\),
\(v_{N+1} := +\infty\), and `N <- length(v)`

.
At the two boundaries, the returned index may differ by 1, depending
on the optional arguments `rightmost.closed`

and `all.inside`

.

##### Usage

```
findInterval(x, vec, rightmost.closed = FALSE, all.inside = FALSE,
left.open = FALSE)
```

##### Arguments

- x
numeric.

- vec
numeric, sorted (weakly) increasingly, of length

`N`

, say.- rightmost.closed
logical; if true, the rightmost interval,

`vec[N-1] .. vec[N]`

is treated as*closed*, see below.- all.inside
logical; if true, the returned indices are coerced into

`1,…,N-1`

, i.e.,`0`

is mapped to`1`

and`N`

to`N-1`

.- left.open
logical; if true all the intervals are open at left and closed at right; in the formulas below, \(\le\) should be swapped with \(<\) (and \(>\) with \(\ge\)), and

`rightmost.closed`

means ‘leftmost is closed’. This may be useful, e.g., in survival analysis computations.

##### Details

The function `findInterval`

finds the index of one vector `x`

in
another, `vec`

, where the latter must be non-decreasing. Where
this is trivial, equivalent to `apply( outer(x, vec, ">="), 1, sum)`

,
as a matter of fact, the internal algorithm uses interval search
ensuring \(O(n \log N)\) complexity where
`n <- length(x)`

(and `N <- length(vec)`

). For (almost)
sorted `x`

, it will be even faster, basically \(O(n)\).

This is the same computation as for the empirical distribution
function, and indeed, `findInterval(t, sort(X))`

is
*identical* to \(n F_n(t; X_1,\dots,X_n)\) where \(F_n\) is the empirical distribution
function of \(X_1,\dots,X_n\).

When `rightmost.closed = TRUE`

, the result for `x[j] = vec[N]`

(\( = \max vec\)), is `N - 1`

as for all other
values in the last interval.

`left.open = TRUE`

is occasionally useful, e.g., for survival data.
For (anti-)symmetry reasons, it is equivalent to using
“mirrored” data, i.e., the following is always true:

identical( findInterval( x, v, left.open= TRUE, ...) , N - findInterval(-x, -v[N:1], left.open=FALSE, ...) )

where `N <- length(vec)`

as above.

##### Value

vector of length `length(x)`

with values in `0:N`

(and
`NA`

) where `N <- length(vec)`

, or values coerced to
`1:(N-1)`

if and only if `all.inside = TRUE`

(equivalently coercing all
x values *inside* the intervals). Note that `NA`

s are
propagated from `x`

, and `Inf`

values are allowed in
both `x`

and `vec`

.

##### See Also

`approx(*, method = "constant")`

which is a
generalization of `findInterval()`

, `ecdf`

for
computing the empirical distribution function which is (up to a factor
of \(n\)) also basically the same as `findInterval(.)`

.

##### Examples

`library(base)`

```
# NOT RUN {
x <- 2:18
v <- c(5, 10, 15) # create two bins [5,10) and [10,15)
cbind(x, findInterval(x, v))
N <- 100
X <- sort(round(stats::rt(N, df = 2), 2))
tt <- c(-100, seq(-2, 2, len = 201), +100)
it <- findInterval(tt, X)
tt[it < 1 | it >= N] # only first and last are outside range(X)
## 'left.open = TRUE' means "mirroring" :
N <- length(v)
stopifnot(identical(
findInterval( x, v, left.open=TRUE) ,
N - findInterval(-x, -v[N:1])))
# }
```

*Documentation reproduced from package base, version 3.4.3, License: Part of R 3.4.3*