survfit.coxph
Computes the predicted survivor function for a Cox proportional hazards model.
 Keywords
 survival
Usage
# S3 method for coxph
survfit(formula, newdata,
se.fit=TRUE, conf.int=.95,
individual=FALSE,
type,vartype,
conf.type=c("log","loglog","plain","none"), censor=TRUE, id,
start.time,
na.action=na.pass, ...)
Arguments
 formula

A
coxph
object.  newdata

a data frame with the same variable names as those that appear
in the
coxph
formula. It is also valid to use a vector, if the data frame would consist of a single row. The curve(s) produced will be representative of a cohort whose covariates correspond to the values innewdata
. Default is the mean of the covariates used in thecoxph
fit.  individual

This argument has been superseded by the
id
argument and is present only for backwards compatability. A logical value indicating whether each row ofnewdata
represents a distinct individual (FALSE, the default), or if each row of the data frame represents different time epochs for only one individual (TRUE). In the former case the result will have one curve for each row innewdata
, in the latter only a single curve will be produced.  conf.int
 the level for a twosided confidence interval on the survival curve(s). Default is 0.95.
 se.fit

a logical value indicating whether standard errors should be
computed. Default is
TRUE
.  type,vartype

a character string specifying the type of survival curve.
Possible values are
"aalen"
,"efron"
, or"kalbfleischprentice"
(only the first two characters are necessary). The default is to match the computation used in the Cox model. The NelsonAalenBreslow estimate forties='breslow'
, the Efron estimate forties='efron'
and the KalbfleischPrentice estimate for a discrete time modelties='exact'
. Variance estimates are the AalenLinkTsiatis, Efron, and Greenwood. The default will be the Efron estimate forties='efron'
and the Aalen estimate otherwise.  conf.type

One of
"none"
,"plain"
,"log"
(the default), or"loglog"
. Only enough of the string to uniquely identify it is necessary. The first option causes confidence intervals not to be generated. The second causes the standard intervalscurve + k *se(curve)
, where k is determined fromconf.int
. The log option calculates intervals based on the cumulative hazard or log(survival). The last option bases intervals on the log hazard or log(log(survival)).  censor
 if FALSE time points at which there are no events (only censoring) are not included in the result.
 id
 optional variable name of subject identifiers. If this is
present, then each group of rows with the same subject id represents
the covariate path through time of a single subject, and the result
will contain one curve per subject. If the
coxph
fit had strata then that must also be specified innewdata
. If missing, then each individual row ofnewdata
is presumed to represent a distinct subject and there will benrow(newdata)
times the number of strata curves in the result (one for each strata/subject combination). result.  start.time
 optional starting time, a single numeric value.
If present the returned curve contains survival after
start.time
conditional on surviving tostart.time
.  na.action
 the na.action to be used on the newdata argument
 …
 for future methods
Details
Serious thought has been given to removing the default value for
newdata
, which is to use a single "pseudo" subject with
covariate values equal to the means of the data set, since
the resulting curve(s) almost never make sense.
It remains due to an unwarranted attachment to the option shown by
some users and by other packages. Two particularly egregious examples
are factor variables and interactions. Suppose one were studying
interspecies transmission of a virus, and the data set has a factor
variable with levels ("pig", "chicken") and about equal numbers of
observations for each. The ``mean'' covariate level will be 1/2 
is this a flying pig? As to interactions assume data with sex coded as 0/1,
ages ranging from 50 to 80, and a model with age*sex. The ``mean''
value for the age:sex interaction term will be about 30, a value
that does not occur in the data.
Users are strongly advised to use the newdata argument. When the original model contains timedependent covariates, then the
path of that covariate through time needs to be specified in order to
obtain a predicted curve. This requires newdata
to contain
multiple lines for each hypothetical subject which gives the covariate
values, time interval, and strata for each line (a subject can change
strata), along with an id
variable
which demarks which rows belong to each subject.
The time interval must have the same (start, stop, status)
variables as the original model: although the status variable is not
used and thus can be set to a dummy value of 0 or 1, it is necessary for
the variables to be recognized as a Surv
object.
Last, although predictions with a timedependent covariate path can be
useful, it is very easy to create a prediction that is senseless. Users
are encouraged to seek out a text that discusses the issue in detail. When a model contains strata but no timedependent covariates the user
of this routine has a choice.
If newdata argument does not contain strata variables then the returned
object will be a matrix of survival curves with one row for each strata
in the model and one column for each row in newdata.
(This is the historical behavior of the routine.)
If newdata does contain strata variables, then the result will contain
one curve per row of newdata, based on the indicated stratum of the
original model. In the rare case of a model with strata by covariate
interactions the strata variable must be included in newdata, the
routine does not allow it to be omitted (predictions become too confusing).
(Note that the model Surv(time, status) ~ age*strata(sex) expands internally to
strata(sex) + age:sex; the sex variable is needed for the second term
of the model.) When all the coefficients are zero,
the KalbfleischPrentice estimator reduces to the KaplanMeier,
the Aalen estimate to the exponential of Nelson's
cumulative hazard estimate, and the Efron estimate to the
FlemingHarrington estimate of survival.
The variances of the curves from a Cox model are larger than these
nonparametric estimates, however, due
to the variance of the coefficients. See survfit
for more details about the counts (number of
events, number at risk, etc.) The censor argument was fixed at FALSE in earlier versions of the code
and not made available to the user.
The default argument is sensible in most instances  and causes the
familiar + sign to appear on plots  it is not sensible for time
dependent covariates since it may lead to a large number of spurious marks.
Value
an object of class "survfit"
.
See survfit.object
for
details. Methods defined for survfit objects are
print
, plot
,
lines
, and points
.
Notes
If the following pair of lines is used inside of another function then
the model=TRUE
argument must be added to the coxph call:
fit < coxph(...); survfit(fit)
.
This is a consequence of the nonstandard evaluation process used by the
model.frame
function when a formula is involved.
References
Fleming, T. H. and Harrington, D. P. (1984). Nonparametric estimation of the survival distribution in censored data. Comm. in Statistics 13, 246986. Kalbfleisch, J. D. and Prentice, R. L. (1980). The Statistical Analysis of Failure Time Data. New York:Wiley. Link, C. L. (1984). Confidence intervals for the survival function using Cox's proportional hazards model with covariates. Biometrics 40, 601610. Therneau T and Grambsch P (2000), Modeling Survival Data: Extending the Cox Model, SpringerVerlag. Tsiatis, A. (1981). A large sample study of the estimate for the integrated hazard function in Cox's regression model for survival data. Annals of Statistics 9, 93108.
See Also
print.survfit
,
plot.survfit
,
lines.survfit
,
coxph
,
Surv
,
strata
.
Examples
#fit a KaplanMeier and plot it
fit < survfit(Surv(time, status) ~ x, data = aml)
plot(fit, lty = 2:3)
legend(100, .8, c("Maintained", "Nonmaintained"), lty = 2:3)
#fit a Cox proportional hazards model and plot the
#predicted survival for a 60 year old
fit < coxph(Surv(futime, fustat) ~ age, data = ovarian)
plot(survfit(fit, newdata=data.frame(age=60)),
xscale=365.25, xlab = "Years", ylab="Survival")
# Here is the data set from Turnbull
# There are no interval censored subjects, only leftcensored (status=3),
# rightcensored (status 0) and observed events (status 1)
#
# Time
# 1 2 3 4
# Type of observation
# death 12 6 2 3
# losses 3 2 0 3
# late entry 2 4 2 5
#
tdata < data.frame(time =c(1,1,1,2,2,2,3,3,3,4,4,4),
status=rep(c(1,0,2),4),
n =c(12,3,2,6,2,4,2,0,2,3,3,5))
fit < survfit(Surv(time, time, status, type='interval') ~1,
data=tdata, weight=n)
#
# Time to progression/death for patients with monoclonal gammopathy
# Competing risk curves (cumulative incidence)
fit1 < survfit(Surv(stop, event=='progression') ~1, data=mgus1,
subset=(start==0))
fit2 < survfit(Surv(stop, status) ~1, data=mgus1,
subset=(start==0), etype=event) #competing risks
# CI curves are always plotted from 0 upwards, rather than 1 down
plot(fit2, fun='event', xscale=365.25, xmax=7300, mark.time=FALSE,
col=2:3, xlab="Years post diagnosis of MGUS")
lines(fit1, fun='event', xscale=365.25, xmax=7300, mark.time=FALSE,
conf.int=FALSE)
text(10, .4, "Competing Risk: death", col=3)
text(16, .15,"Competing Risk: progression", col=2)
text(15, .30,"KM:prog")