icfit: calculate non-parametric MLE for interval censored survival function

Description

This function calculates the the non-parametric maximum likelihood estimate for the distribution from interval censored data using the self-consistent estimator, so the associated survival distribution generalizes the Kaplan-Meier estimate to interval censored data. Formulas using Surv are allowed similar to survfit.

Usage

## S3 method for class 'formula':
icfit(formula, data, \dots)

## S3 method for class 'default':
icfit(L, R,initfit =NULL, control=icfitControl(), Lin=NULL, Rin=NULL, ...)

Arguments

numeric vector of left endpoints of censoring interval (equivalent to first element of Surv when type='interval2', see details)

numeric vector of right endpoints of censoring interval (equivalent to second element of Surv function when type='interval2', see details)

initfit

an initial estimate as an object of class icfit or icsurv, or a character vector of the name of the function used to calculate the initial estimate (see details)

control

list of arguments for controling algorithm (see icfitControl)

Lin

logical vector, should L be included in the interval? (see details)

Rin

logical vector, should R be included in the interval? (see details)

formula

a formula with response a numeric vector (which assumes no censoring) or Surv object the right side of the formula may be 1 or a factor (which produces separate fits for each level).

data

an optional matrix or data frame containing the variables in the formula. By default the variables are taken from environment(formula).

...

values passed to other functions

Value

An object of class icfit (same as icsurv class, see details). There are 4 methods for this class: plot.icfit, print.icfit, summary.icfit, and [.icfit. The last method pulls out individual fits when the right side of the formula of the icfit call was a factor. A list with elements:
Athis is the n by k matrix of indicator functions, NULL if more than one strata, not printed by default
strataa named numeric vector of numbers of observations in each strata, if one strata observation named NPMLE
errorthis is max(d + u - n), see Gentleman and Geyer, 1994
numitnumber of iterations
pfvector of estimated probabilities of the distribution
intmap2 by k matrix, where the ith column defines an interval corresponding to the probability, pf[i]
convergea logical, TRUE if normal convergence
messagecharacter text message on about convergence
anypzerological denoting whether any of the Turnbull intervals were set to zero

Details

The icfit function fits the nonparametric maximum likelihood estimate (NPMLE) of the distribution function for interval censored data. In the default case (when Lin=Rin=NULL) we assume there are n (n=length(L)) failure times, and the ith one is in the interval between L[i] and R[i]. The default is not to include L[i] in the interval unless L[i]=R[i], and to include R[i] in the interval unless R[i]=Inf. When Lin and Rin are not NULL they describe whether to include L and R in the associated interval. If either Lin or Rin is length 1 then it is repeated n times, otherwise they should be logicals of length n. The algorithm is basically an EM-algorithm applied to interval censored data (see Turnbull, 1976); however first we can define a set of intervals (called the Turnbull intervals) which are the only intervals where the NPMLE may change. The Turnbull intervals are also called the innermost intervals, and are the result of the primary reduction (see Aragon and Eberly, 1992). The starting distribution for the E-M algorithm is given by initfit, which may be either (1) NULL, in which case a very simple and quick starting distribution is used (see code), (2) a character vector describing a function with inputs, L,R, Lin, Rin, and A, see for example initcomputeMLE, (3) a list giving pf and intmap values, e.g., an icfit object. If option (2) is tried and results in an error then the starting distribution reverts to the one used with option (1). Convergence is defined when the maximum reduced gradient is less than epsilon (see icfitControl), and the Kuhn-Tucker conditions are approximately met, otherwise a warning will result. (see Gentleman and Geyer, 1994). There are other faster algorithms (for example see EMICM in the package Icens. The output is of class icfit which is identical to the icsurv class of the Icens package when there is only one group for which a distribution is needed. Following that class, there is an intmap element which gives the bounds about which each drop in the NPMLE survival function can occur. Since the classes icfit and icsurv are so closely related, one can directly use of initial (and faster) fits from the Icens package as input in initfit. Note that when using a non-null initfit, the Lin and Rin values of the initial fit are ignored. Alternatively, one may give the name of the function used to calculate the initial fit. The function is assumed to input the transpose of the A matrix (called A in the Icens package). Options can be passed to initfit function as a list using the initfitOpts variable in icfitControl. The advantage of the icfit function over those in Icens package is that it allows a call similar to that used in survfit of the survival package so that different groups may be plotted at the same time with similar calls. An icfit object prints as a list (see value below). A print function prints output as a list except suppresses printing of A matrix. A summary function prints the distribution (i.e., probabilities and the intervals where those probability masses are known to reside) for each group in the icfit object. There is also a plot method, see plot.icfit. For additional references and background see Fay and Shaw (2010).

References

Aragon, J and Eberly, D (1992). On convergence of convex minorant algorithms for distribution estimation with interval-censored data. J. of Computational and Graphical Statistics. 1: 129-140. Fay, MP and Shaw, PA (2010). Exact and Asymptotic Weighted Logrank Tests for Interval Censored Data: The interval R package. Journal of Statistical Software. http://www.jstatsoft.org/v36/i02/. 36 (2):1-34. Gentleman, R. and Geyer, C.J. (1994). Maximum likelihood for interval censored data:consistency and computation. Biometrika, 81, 618-623. Turnbull, B.W. (1976) The empirical distribution function with arbitrarily grouped, censored and truncated data. J. R. Statist. Soc. B 38, 290-295.

Examples

Run this code

data(bcos)
icout<-icfit(Surv(left,right,type="interval2")~treatment, data=bcos)
plot(icout)
## can pick out just one group
plot(icout[1])

Run the code above in your browser using DataLab