Compute the Kaplan-Meier estimator of a survival time distribution function, from histogram data

`kaplan.meier(obs, nco, breaks, upperobs=0)`

A list with two elements:

- km
Kaplan-Meier estimate of the survival time c.d.f. \(F(t)\)

- lambda
corresponding Nelson-Aalen estimate of the hazard rate \(\lambda(t)\)

These are numeric vectors of length \(n\).

- obs
vector of \(n\) integers giving the histogram of all observations (censored or uncensored survival times)

- nco
vector of \(n\) integers giving the histogram of uncensored observations (those survival times that are less than or equal to the censoring time)

- breaks
Vector of \(n+1\) breakpoints which were used to form both histograms.

- upperobs
Number of observations beyond the rightmost breakpoint, if any.

Adrian Baddeley Adrian.Baddeley@curtin.edu.au

and Rolf Turner r.turner@auckland.ac.nz

This function is needed mainly for internal use in spatstat, but may be useful in other applications where you want to form the Kaplan-Meier estimator from a huge dataset.

Suppose \(T_i\) are the survival times of individuals \(i=1,\ldots,M\) with unknown distribution function \(F(t)\) which we wish to estimate. Suppose these times are right-censored by random censoring times \(C_i\). Thus the observations consist of right-censored survival times \(\tilde T_i = \min(T_i,C_i)\) and non-censoring indicators \(D_i = 1\{T_i \le C_i\}\) for each \(i\).

If the number of observations \(M\) is large, it is efficient to
use histograms.
Form the histogram `obs`

of all observed times \(\tilde T_i\).
That is, `obs[k]`

counts the number of values
\(\tilde T_i\) in the interval
`(breaks[k],breaks[k+1]]`

for \(k > 1\)
and `[breaks[1],breaks[2]]`

for \(k = 1\).
Also form the histogram `nco`

of all uncensored times,
i.e. those \(\tilde T_i\) such that \(D_i=1\).
These two histograms are the arguments passed to `kaplan.meier`

.

The vectors `km`

and `lambda`

returned by `kaplan.meier`

are (histogram approximations to) the Kaplan-Meier estimator
of \(F(t)\) and its hazard rate \(\lambda(t)\).
Specifically, `km[k]`

is an estimate of
`F(breaks[k+1])`

, and `lambda[k]`

is an estimate of
the average of \(\lambda(t)\) over the interval
`(breaks[k],breaks[k+1])`

.

The histogram breaks must include \(0\).
If the histogram breaks do not span the range of the observations,
it is important to count how many survival times
\(\tilde T_i\) exceed the rightmost breakpoint,
and give this as the value `upperobs`

.

`reduced.sample`

,
`km.rs`