spatstat (version 1.11-8)

kaplan.meier: Kaplan-Meier Estimator using Histogram Data

Description

Compute the Kaplan-Meier estimator of a survival time distribution function, from histogram data

Usage

kaplan.meier(obs, nco, breaks, upperobs=0)

Arguments

obs
vector of $n$ integers giving the histogram of all observations (censored or uncensored survival times)
nco
vector of $n$ integers giving the histogram of uncensored observations (those survival times that are less than or equal to the censoring time)
breaks
Vector of $n+1$ breakpoints which were used to form both histograms.
upperobs
Number of observations beyond the rightmost breakpoint, if any.

Value

  • A list with two elements:
  • kmKaplan-Meier estimate of the survival time c.d.f. $F(t)$
  • lambdacorresponding Nelson-Aalen estimate of the hazard rate $\lambda(t)$
  • These are numeric vectors of length $n$.

Details

This function is needed mainly for internal use in spatstat, but may be useful in other applications where you want to form the Kaplan-Meier estimator from a huge dataset.

Suppose $T_i$ are the survival times of individuals $i=1,\ldots,M$ with unknown distribution function $F(t)$ which we wish to estimate. Suppose these times are right-censored by random censoring times $C_i$. Thus the observations consist of right-censored survival times $\tilde T_i = \min(T_i,C_i)$ and non-censoring indicators $D_i = 1{T_i \le C_i}$ for each $i$.

If the number of observations $M$ is large, it is efficient to use histograms. Form the histogram obs of all observed times $\tilde T_i$. That is, obs[k] counts the number of values $\tilde T_i$ in the interval (breaks[k],breaks[k+1]] for $k > 1$ and [breaks[1],breaks[2]] for $k = 1$. Also form the histogram nco of all uncensored times, i.e. those $\tilde T_i$ such that $D_i=1$. These two histograms are the arguments passed to kaplan.meier. The vectors km and lambda returned by kaplan.meier are (histogram approximations to) the Kaplan-Meier estimator of $F(t)$ and its hazard rate $\lambda(t)$. Specifically, km[k] is an estimate of F(breaks[k+1]), and lambda[k] is an estimate of the average of $\lambda(t)$ over the interval (breaks[k],breaks[k+1]).

The histogram breaks must include $0$. If the histogram breaks do not span the range of the observations, it is important to count how many survival times $\tilde T_i$ exceed the rightmost breakpoint, and give this as the value upperobs.

See Also

reduced.sample, km.rs