reduced.sample: Reduced Sample Estimator using Histogram Data

Description

Compute the Reduced Sample estimator of a survival time distribution function, from histogram data

Usage

reduced.sample(nco, cen, ncc, show=FALSE, uppercen=0)

Value

If show = FALSE, a numeric vector giving the values of the reduced sample estimator. If show=TRUE, a list with three components which are vectors of equal length,

rs: Reduced sample estimate of the survival time c.d.f. \(F(t)\)
numerator: numerator of the reduced sample estimator
denominator: denominator of the reduced sample estimator

Arguments

nco: vector of counts giving the histogram of uncensored observations (those survival times that are less than or equal to the censoring time)
cen: vector of counts giving the histogram of censoring times
ncc: vector of counts giving the histogram of censoring times for the uncensored observations only
uppercen: number of censoring times greater than the rightmost histogram breakpoint (if there are any)
show: Logical value controlling the amount of detail returned by the function value (see below)

Author

Adrian Baddeley Adrian.Baddeley@curtin.edu.au

and Rolf Turner r.turner@auckland.ac.nz

Details

This function is needed mainly for internal use in spatstat, but may be useful in other applications where you want to form the reduced sample estimator from a huge dataset.

Suppose \(T_i\) are the survival times of individuals \(i=1,\ldots,M\) with unknown distribution function \(F(t)\) which we wish to estimate. Suppose these times are right-censored by random censoring times \(C_i\). Thus the observations consist of right-censored survival times \(\tilde T_i = \min(T_i,C_i)\) and non-censoring indicators \(D_i = 1\{T_i \le C_i\}\) for each \(i\).

If the number of observations \(M\) is large, it is efficient to use histograms. Form the histogram cen of all censoring times \(C_i\). That is, obs[k] counts the number of values \(C_i\) in the interval (breaks[k],breaks[k+1]] for \(k > 1\) and [breaks[1],breaks[2]] for \(k = 1\). Also form the histogram nco of all uncensored times, i.e. those \(\tilde T_i\) such that \(D_i=1\), and the histogram of all censoring times for which the survival time is uncensored, i.e. those \(C_i\) such that \(D_i=1\). These three histograms are the arguments passed to kaplan.meier.

The return value rs is the reduced-sample estimator of the distribution function \(F(t)\). Specifically, rs[k] is the reduced sample estimate of F(breaks[k+1]). The value is exact, i.e. the use of histograms does not introduce any approximation error.