Compute the Kaplan-Meier estimator of a survival time distribution function, from histogram data

`kaplan.meier(obs, nco, breaks, upperobs=0)`

obs

vector of \(n\) integers giving the histogram of all observations (censored or uncensored survival times)

nco

vector of \(n\) integers giving the histogram of uncensored observations (those survival times that are less than or equal to the censoring time)

breaks

Vector of \(n+1\) breakpoints which were used to form both histograms.

upperobs

Number of observations beyond the rightmost breakpoint, if any.

A list with two elements:

Kaplan-Meier estimate of the survival time c.d.f. \(F(t)\)

corresponding Nelson-Aalen estimate of the hazard rate \(\lambda(t)\)

This function is needed mainly for internal use in spatstat, but may be useful in other applications where you want to form the Kaplan-Meier estimator from a huge dataset.

Suppose \(T_i\) are the survival times of individuals \(i=1,\ldots,M\) with unknown distribution function \(F(t)\) which we wish to estimate. Suppose these times are right-censored by random censoring times \(C_i\). Thus the observations consist of right-censored survival times \(\tilde T_i = \min(T_i,C_i)\) and non-censoring indicators \(D_i = 1\{T_i \le C_i\}\) for each \(i\).

If the number of observations \(M\) is large, it is efficient to
use histograms.
Form the histogram `obs`

of all observed times \(\tilde T_i\).
That is, `obs[k]`

counts the number of values
\(\tilde T_i\) in the interval
`(breaks[k],breaks[k+1]]`

for \(k > 1\)
and `[breaks[1],breaks[2]]`

for \(k = 1\).
Also form the histogram `nco`

of all uncensored times,
i.e. those \(\tilde T_i\) such that \(D_i=1\).
These two histograms are the arguments passed to `kaplan.meier`

.

The vectors `km`

and `lambda`

returned by `kaplan.meier`

are (histogram approximations to) the Kaplan-Meier estimator
of \(F(t)\) and its hazard rate \(\lambda(t)\).
Specifically, `km[k]`

is an estimate of
`F(breaks[k+1])`

, and `lambda[k]`

is an estimate of
the average of \(\lambda(t)\) over the interval
`(breaks[k],breaks[k+1])`

.

The histogram breaks must include \(0\).
If the histogram breaks do not span the range of the observations,
it is important to count how many survival times
\(\tilde T_i\) exceed the rightmost breakpoint,
and give this as the value `upperobs`

.