
Last chance! 50% off unlimited learning
Sale ends in
vec
, find the
interval containing each element of x
; i.e., if
i <- findInterval(x,v)
, for each index j
in x
$v[i[j]] \le x[j] < v[i[j] + 1]$
where $v[0] := - Inf$,
$v[N+1] := + Inf$, and N <- length(v)
.
At the two boundaries, the returned index may differ by 1, depending
on the optional arguments rightmost.closed
and all.inside
.
findInterval(x, vec, rightmost.closed = FALSE, all.inside = FALSE)
N
,
say.vec[N-1] .. vec[N]
is treated as closed, see below.1,...,N-1
, i.e., 0
is mapped to 1
and N
to N-1
.findInterval
finds the index of one vector x
in
another, vec
, where the latter must be non-decreasing. Where
this is trivial, equivalent to apply( outer(x, vec, ">="), 1, sum)
,
as a matter of fact, the internal algorithm uses interval search
ensuring $O(n * log(N))$ complexity where
n <- length(x)
(and N <- length(vec)
). For (almost)
sorted x
, it will be even faster, basically $O(n)$. This is the same computation as for the empirical distribution
function, and indeed, findInterval(t, sort(X))
is
identical to $n * Fn(t;
X[1],..,X[n])$ where $Fn$ is the empirical distribution
function of $X[1],..,X[n]$.
When rightmost.closed = TRUE
, the result for x[j] = vec[N]
($ = max(vec)$), is N - 1
as for all other
values in the last interval.
approx(*, method = "constant")
which is a
generalization of findInterval()
, ecdf
for
computing the empirical distribution function which is (up to a factor
of $n$) also basically the same as findInterval(.)
.
x <- 2:18
v <- c(5, 10, 15) # create two bins [5,10) and [10,15)
cbind(x, findInterval(x, v))
N <- 100
X <- sort(round(stats::rt(N, df = 2), 2))
tt <- c(-100, seq(-2, 2, len = 201), +100)
it <- findInterval(tt, X)
tt[it < 1 | it >= N] # only first and last are outside range(X)
Run the code above in your browser using DataLab