spatstat.core (version 2.1-2)

# kaplan.meier: Kaplan-Meier Estimator using Histogram Data

## Description

Compute the Kaplan-Meier estimator of a survival time distribution function, from histogram data

## Usage

kaplan.meier(obs, nco, breaks, upperobs=0)

## Arguments

obs

vector of $$n$$ integers giving the histogram of all observations (censored or uncensored survival times)

nco

vector of $$n$$ integers giving the histogram of uncensored observations (those survival times that are less than or equal to the censoring time)

breaks

Vector of $$n+1$$ breakpoints which were used to form both histograms.

upperobs

Number of observations beyond the rightmost breakpoint, if any.

## Value

A list with two elements:

km

Kaplan-Meier estimate of the survival time c.d.f. $$F(t)$$

lambda

corresponding Nelson-Aalen estimate of the hazard rate $$\lambda(t)$$

These are numeric vectors of length n.

## Details

This function is needed mainly for internal use in spatstat, but may be useful in other applications where you want to form the Kaplan-Meier estimator from a huge dataset.

Suppose $$T_i$$ are the survival times of individuals $$i=1,\ldots,M$$ with unknown distribution function $$F(t)$$ which we wish to estimate. Suppose these times are right-censored by random censoring times $$C_i$$. Thus the observations consist of right-censored survival times $$\tilde T_i = \min(T_i,C_i)$$ and non-censoring indicators $$D_i = 1\{T_i \le C_i\}$$ for each $$i$$.

If the number of observations $$M$$ is large, it is efficient to use histograms. Form the histogram obs of all observed times $$\tilde T_i$$. That is, obs[k] counts the number of values $$\tilde T_i$$ in the interval (breaks[k],breaks[k+1]] for $$k > 1$$ and [breaks[1],breaks[2]] for $$k = 1$$. Also form the histogram nco of all uncensored times, i.e. those $$\tilde T_i$$ such that $$D_i=1$$. These two histograms are the arguments passed to kaplan.meier.

The vectors km and lambda returned by kaplan.meier are (histogram approximations to) the Kaplan-Meier estimator of $$F(t)$$ and its hazard rate $$\lambda(t)$$. Specifically, km[k] is an estimate of F(breaks[k+1]), and lambda[k] is an estimate of the average of $$\lambda(t)$$ over the interval (breaks[k],breaks[k+1]).

The histogram breaks must include $$0$$. If the histogram breaks do not span the range of the observations, it is important to count how many survival times $$\tilde T_i$$ exceed the rightmost breakpoint, and give this as the value upperobs.

reduced.sample, km.rs