# survfit.formula

##### Compute a Survival Curve for Censored Data

Computes an estimate of a survival curve for censored data using the Aalen-Johansen estimator. For ordinary (single event) survival this reduces to the Kaplan-Meier estimate.

- Keywords
- survival

##### Usage

```
# S3 method for formula
survfit(formula, data, weights, subset, na.action,
stype=1, ctype=1, id, cluster, istate, timefix=TRUE,
etype, error, ...)
```

##### Arguments

- formula
a formula object, which must have a

`Surv`

object as the response on the left of the`~`

operator and, if desired, terms separated by + operators on the right. One of the terms may be a`strata`

object. For a single survival curve the right hand side should be`~ 1`

.- data
a data frame in which to interpret the variables named in the formula,

`subset`

and`weights`

arguments.- weights
The weights must be nonnegative and it is strongly recommended that they be strictly positive, since zero weights are ambiguous, compared to use of the

`subset`

argument.- subset
expression saying that only a subset of the rows of the data should be used in the fit.

- na.action
a missing-data filter function, applied to the model frame, after any

`subset`

argument has been used. Default is`options()$na.action`

.- stype
the method to be used estimation of the survival curve: 1 = direct, 2 = exp(cumulative hazard).

- ctype
the method to be used for estimation of the cumulative hazard: 1 = Nelson-Aalen formula, 2 = Fleming-Harrington correction for tied events.

- id
identifies individual subjects, when a given person can have multiple lines of data.

- cluster
used to group observations for the infinitesmal jackknife variance estimate, defaults to the value of id

- istate
for multi-state models, identifies the initial state of each subject or observation

- timefix
process times through the

`aeqSurv`

function to eliminate potential roundoff issues.- etype
a variable giving the type of event. This has been superseded by multi-state Surv objects and is depricated; see example below.

- error
this argument is no longer used

- …
The following additional arguments are passed to internal functions called by

`survfit`

.- se.fit
logical value, default is TRUE. If FALSE then standard error computations are omitted.

- conf.type
One of

`"none"`

,`"plain"`

,`"log"`

(the default),`"log-log"`

or`"logit"`

. Only enough of the string to uniquely identify it is necessary. The first option causes confidence intervals not to be generated. The second causes the standard intervals`curve +- k *se(curve)`

, where k is determined from`conf.int`

. The log option calculates intervals based on the cumulative hazard or log(survival). The log-log option bases the intervals on the log hazard or log(-log(survival)), and the logit option on log(survival/(1-survival)).- conf.lower
a character string to specify modified lower limits to the curve, the upper limit remains unchanged. Possible values are

`"usual"`

(unmodified),`"peto"`

, and`"modified"`

. The modified lower limit is based on an "effective n" argument. The confidence bands will agree with the usual calculation at each death time, but unlike the usual bands the confidence interval becomes wider at each censored observation. The extra width is obtained by multiplying the usual variance by a factor m/n, where n is the number currently at risk and m is the number at risk at the last death time. (The bands thus agree with the un-modified bands at each death time.) This is especially useful for survival curves with a long flat tail. The Peto lower limit is based on the same "effective n" argument as the modified limit, but also replaces the usual Greenwood variance term with a simple approximation. It is known to be conservative.- start.time
numeric value specifying a time to start calculating survival information. The resulting curve is the survival conditional on surviving to

`start.time`

.- conf.int
the level for a two-sided confidence interval on the survival curve(s). Default is 0.95.

- se.fit
a logical value indicating whether standard errors should be computed. Default is

`TRUE`

.- influence
a logical value indicating whether to return the infinitesimal jackknife (influence) values for each subject. These contain the values of the derivative of each value with respect to the case weights of each subject i: \(\partial p/\partial w_i\), evaluated at the vector of weights. The resulting object will contain

`influence.surv`

and`influence.chaz`

components. Alternatively, options of`influence=1`

or`influence=2`

will return values for only the survival or hazard curves, respectively.- p0
this applies only to multi-state curves. An optional vector giving the initial probability across the states. If this is missing, then p0 is estimated using the frequency of the starting states of all observations at risk at

`start.time`

, or if that is not specified, at the time of the first event.- type
an older argument that combined

`stype`

and`ctype`

, now depricated. Legal values were "kaplan-meier" which is equivalent to`stype=1, ctype=1`

, "fleming-harrington" which is equivalent to`stype=2, ctype=1`

, and "fh2" which is equivalent to`stype=2, ctype=2.`

##### Details

If there is a `data`

argument, then variables in the `formula`

,
codeweights, `subset`

, `id`

, `cluster`

and
`istate`

arguments will be searched for in that data set.

The routine returns both an estimated probability in state and an
estimated cumulative hazard estimate.
The cumulative hazard estimate is the Nelson-Aalen (NA) estimate or the
Fleming-Harrington (FH) estimate, the latter includes a correct for
tied event times. The estimated probability in state can estimated
either using the exponential of the cumulative hazard, or as a direct
estimate using the Aalen-Johansen approach.
For single state data the AJ estimate reduces to the Kaplan-Meier and
the probability in state to the survival curve;
for competing risks data the AJ reduces to the cumulative incidence (CI)
estimator.
For backward compatability the `type`

argument can be used instead.

When the data set includes left censored or interval censored data (or both), then the EM approach of Turnbull is used to compute the overall curve. Currently this algorithm is very slow, only a survival curve is produced, and it does not support a robust variance.

If a `id`

or `cluster`

argument is present, or for multi-state
curves, then the standard
errors of the results will be based on an infinitesimal jackknife (IJ)
estimate, otherwise the standard model based estimate will be used.
With the IJ estimate, the leverage values themselves can be returned
as arrays with dimensions: number of subjects, number of unique times,
and for a multi-state model, the number of unique states.
Be forwarned that these arrays can be huge. If there is a
`cluster`

argument this first dimension will be the number of
clusters and the variance will be a grouped IJ estimate; this can be
an important tool for reducing the size.
A numeric value for the `influence`

argument allows finer
control: 0= return neither (same as FALSE), 1= return the influence
array for probability in state, 2= return the influence array for the
cumulative hazard, 3= both (same as TRUE).

##### Value

an object of class `"survfit"`

.
See `survfit.object`

for
details. Methods defined for survfit objects are
`print`

, `plot`

,
`lines`

, and `points`

.

##### References

Dorey, F. J. and Korn, E. L. (1987). Effective sample sizes for confidence
intervals for survival probabilities. *Statistics in Medicine*
**6**, 679-87.

Fleming, T. H. and Harrington, D. P. (1984). Nonparametric estimation of the
survival distribution in censored data. *Comm. in Statistics*
**13**, 2469-86.

Kalbfleisch, J. D. and Prentice, R. L. (1980).
*The Statistical Analysis of Failure Time Data.*
New York:Wiley.

Kyle, R. A. (1997).
Moncolonal gammopathy of undetermined significance and solitary
plasmacytoma. Implications for progression to overt multiple myeloma},
*Hematology/Oncology Clinics N. Amer.*
**11**, 71-87.

Link, C. L. (1984). Confidence intervals for the survival
function using Cox's proportional hazards model with
covariates. *Biometrics*
**40**, 601-610.

Turnbull, B. W. (1974). Nonparametric estimation of a survivorship
function with doubly censored data. *J Am Stat Assoc*,
**69**, 169-173.

##### See Also

`survfit.coxph`

for survival curves from Cox models,
`survfit.object`

for a description of the components of a
survfit object,
`print.survfit`

,
`plot.survfit`

,
`lines.survfit`

,
`coxph`

,
`Surv`

.

##### Examples

```
# NOT RUN {
#fit a Kaplan-Meier and plot it
fit <- survfit(Surv(time, status) ~ x, data = aml)
plot(fit, lty = 2:3)
legend(100, .8, c("Maintained", "Nonmaintained"), lty = 2:3)
#fit a Cox proportional hazards model and plot the
#predicted survival for a 60 year old
fit <- coxph(Surv(futime, fustat) ~ age, data = ovarian)
plot(survfit(fit, newdata=data.frame(age=60)),
xscale=365.25, xlab = "Years", ylab="Survival")
# Here is the data set from Turnbull
# There are no interval censored subjects, only left-censored (status=3),
# right-censored (status 0) and observed events (status 1)
#
# Time
# 1 2 3 4
# Type of observation
# death 12 6 2 3
# losses 3 2 0 3
# late entry 2 4 2 5
#
tdata <- data.frame(time =c(1,1,1,2,2,2,3,3,3,4,4,4),
status=rep(c(1,0,2),4),
n =c(12,3,2,6,2,4,2,0,2,3,3,5))
fit <- survfit(Surv(time, time, status, type='interval') ~1,
data=tdata, weight=n)
#
# Time to progression/death for patients with monoclonal gammopathy
# Competing risk curves (cumulative incidence)
#
fitKM <- survfit(Surv(stop, event=='pcm') ~1, data=mgus1,
subset=(start==0))
fitCI <- survfit(Surv(stop, event) ~1,
data=mgus1, subset=(start==0))
# }
# NOT RUN {
# CI curves show the probability in state
plot(fitCI, xscale=365.25, xmax=7300, mark.time=FALSE,
col=2:3, xlab="Years post diagnosis of MGUS",
ylab="P(state)")
lines(fitKM, fun='event', xmax=7300, mark.time=FALSE,
conf.int=FALSE)
text(3652, .4, "Competing risk: death", col=3)
text(5840, .15,"Competing risk: progression", col=2)
text(5480, .30,"KM:prog")
# }
```

*Documentation reproduced from package survival, version 3.1-8, License: LGPL (>= 2)*