Computes the predicted survivor function for a Cox proportional hazards model.

```
# S3 method for coxph
survfit(formula, newdata,
se.fit=TRUE, conf.int=.95, individual=FALSE, stype=2, ctype,
conf.type=c("log","log-log","plain","none", "logit", "arcsin"),
censor=TRUE, start.time, id, influence=FALSE,
na.action=na.pass, type, time0=FALSE, ...)
# S3 method for coxphms
survfit(formula, newdata,
se.fit=FALSE, conf.int=.95, individual=FALSE, stype=2, ctype,
conf.type=c("log","log-log","plain","none", "logit", "arcsin"),
censor=TRUE, start.time, id, influence=FALSE,
na.action=na.pass, type, p0=NULL, time0= FALSE, ...)
```

an object of class `"survfit"`

.
See `survfit.object`

for
details. Methods defined for survfit objects are
`print`

, `plot`

,
`lines`

, and `points`

.

- formula
A

`coxph`

object.- newdata
a data frame with the same variable names as those that appear in the

`coxph`

formula. One curve is produced per row. The curve(s) produced will be representative of a cohort whose covariates correspond to the values in`newdata`

.- se.fit
a logical value indicating whether standard errors should be computed. Default is

`TRUE`

for standard models,`FALSE`

for multi-state (code not yet present for that case.)- conf.int
the level for a two-sided confidence interval on the survival curve(s). Default is 0.95.

- individual
deprecated argument, replaced by the general

`id`

- stype
computation of the survival curve, 1=direct, 2= exponenial of the cumulative hazard.

- ctype
whether the cumulative hazard computation should have a correction for ties, 1=no, 2=yes.

- conf.type
One of

`"none"`

,`"plain"`

,`"log"`

(the default),`"log-log"`

or`"logit"`

. Only enough of the string to uniquely identify it is necessary. The first option causes confidence intervals not to be generated. The second causes the standard intervals`curve +- k *se(curve)`

, where k is determined from`conf.int`

. The log option calculates intervals based on the cumulative hazard or log(survival). The log-log option uses the log hazard or log(-log(survival)), and the logit log(survival/(1-survival)).- censor
if FALSE time points at which there are no events (only censoring) are not included in the result.

- id
optional variable name of subject identifiers. If this is present, it will be search for in the

`newdata`

data frame. Each group of rows in`newdata`

with the same subject id represents the covariate path through time of a single subject, and the result will contain one curve per subject. If the`coxph`

fit had strata then that must also be specified in`newdata`

. If`newid`

is not present, then each individual row of`newdata`

is presumed to represent a distinct subject.- start.time
optional starting time, a single numeric value. If present the returned curve contains survival after

`start.time`

conditional on surviving to`start.time`

.- influence
option to return the influence values

- na.action
the na.action to be used on the newdata argument

- type
older argument that encompassed

`stype`

and`ctype`

, now deprecated- p0
optional, a vector of probabilities. The returned curve will be for a cohort with this mixture of starting states. Most often a single state is chosen

- time0
include the starting time for the curve in the output

- ...
for future methods

If the following pair of lines is used inside of another function then
the `model=TRUE`

argument must be added to the coxph call:
`fit <- coxph(...); survfit(fit)`

.
This is a consequence of the non-standard evaluation process used by the
`model.frame`

function when a formula is involved.

Let \(\log[S(t; z)]\) be the log of the survival curve
for a fixed covariate vector \(z\), then
\(\log[S(t; x)]= e^{(x-z)\beta}\log[S(t; z)]\)
is the log of the curve for any new covariate vector \(x\).
There is an unfortunate tendency to refer to the reference curve with
\(z=0\) as `THE' baseline hazard. However, any \(z\) can be used as
the reference point, and more importantly, if \(x-z\) is large the
compuation can suffer severe roundoff error. It is always safest to
provide the desired \(x\) values directly via `newdata`

.

This routine produces Pr(state) curves based on a `coxph`

model fit. For single state models it produces the single curve for
S(t) = Pr(remain in initial state at time t), known as the survival
curve; for multi-state models a matrix giving probabilities for all states.
The `stype`

argument states the type of estimate, and defaults
to the exponential of the cumulative hazard, better known as the Breslow
estimate. For a multi-state Cox model this involves the exponential
of a matrix.
The argument `stype=1`

uses a non-exponential or `direct'
estimate. For a single endpoint coxph model the code evaluates the
Kalbfleich-Prentice estimate, and for a multi-state model it uses an
analog of the Aalen-Johansen estimator. The latter approach is the
default in the `mstate`

package.

The `ctype`

option affects the estimated cumulative hazard, and
if `stype=2`

the estimated P(state) curves as well. If not
present it is chosen so as to be concordant with the
`ties`

option in the `coxph`

call. (For multistate
`coxphms`

objects, only `ctype=1`

is currently implemented.)
Likewise
the choice between a model based and robust variance estimate for the
curve will mirror the choice made in the `coxph`

call,
any clustering is also inherited from the parent model.

If the `newdata`

argument is missing, then a curve is produced
for a single "pseudo" subject with
covariate values equal to the `means`

component of the fit.
The resulting curve(s) rarely make scientific sense, but
the default remains due to an unwarranted belief by many that it
represents an "average" curve, and it's use as a default in other
packages. For coxph, the `means`

component will contain the value
0 for any 0/1 or TRUE/FALSE variables, and the mean value in the data
for others. Its primary reason for this default is to
increase numerical accuracy in internal computations of the routine
via recentering the X matrix;
there is no reason to assume this represents an `interesting'
hypothetical subject for prediction of their survival curve.
Users are strongly advised to use the newdata argument;
predictions from a multistate coxph model require the newdata argument.

If the `coxph`

model contained an offset term, then the data set
in the `newdata`

argument should also contain that variable.

When the original model contains time-dependent covariates, then the
path of that covariate through time needs to be specified in order to
obtain a predicted curve. This requires `newdata`

to contain
multiple lines for each hypothetical subject which gives the covariate
values, time interval, and strata for each line (a subject can change
strata), along with an `id`

variable
which demarks which rows belong to each subject.
The time interval must have the same (start, stop, status)
variables as the original model: although the status variable is not
used and thus can be set to a dummy value of 0 or 1, it is necessary for
the response to be recognized as a `Surv`

object.
Last, although predictions with a time-dependent covariate path can be
useful, it is very easy to create a prediction that is senseless. Users
are encouraged to seek out a text that discusses the issue in detail.

When a model contains strata but no time-dependent covariates the user of this routine has a choice. If newdata argument does not contain strata variables then the returned object will be a matrix of survival curves with one row for each strata in the model and one column for each row in newdata. (This is the historical behavior of the routine.) If newdata does contain strata variables, then the result will contain one curve per row of newdata, based on the indicated stratum of the original model. In the rare case of a model with strata by covariate interactions the strata variable must be included in newdata, the routine does not allow it to be omitted (predictions become too confusing). (Note that the model Surv(time, status) ~ age*strata(sex) expands internally to strata(sex) + age:sex; the sex variable is needed for the second term of the model.)

See `survfit`

for more details about the counts (number of
events, number at risk, etc.)

Fleming, T. H. and Harrington, D. P. (1984). Nonparametric estimation of the
survival distribution in censored data. *Comm. in Statistics*
**13**, 2469-86.

Kalbfleisch, J. D. and Prentice, R. L. (1980).
*The Statistical Analysis of Failure Time Data.*
New York:Wiley.

Link, C. L. (1984). Confidence intervals for the survival
function using Cox's proportional hazards model with
covariates. *Biometrics*
**40**, 601-610.

Therneau T and Grambsch P (2000), Modeling Survival Data: Extending the Cox Model, Springer-Verlag.

Tsiatis, A. (1981). A large sample study of the estimate
for the integrated hazard function in Cox's regression
model for survival data. *Annals of Statistics*
**9**, 93-108.

`print.survfit`

,
`plot.survfit`

,
`lines.survfit`

,
`coxph`

,
`Surv`

,
`strata`

.