Cox Proportional Hazards Model and Extensions

Modification of Therneau's coxph function to fit the Cox model and its extension, the Andersen-Gill model. The latter allows for interval time-dependent covariables, time-dependent strata, and repeated events. The Survival method for an object created by cph returns an S function for computing estimates of the survival function. The Quantile method for cph returns an S function for computing quantiles of survival time (median, by default). The Mean method returns a function for computing the mean survival time. This function issues a warning if the last follow-up time is uncensored, unless a restricted mean is explicitly requested.

models, nonparametric, survival
cph(formula = formula(data), data=parent.frame(),
    weights, subset, na.action=na.delete, 
    singular.ok=FALSE, robust=FALSE,
    model=FALSE, x=FALSE, y=FALSE, se.fit=FALSE, 
    eps=1e-4, init, iter.max=10, tol=1e-9, surv=FALSE, time.inc,
    type, vartype, ...)

## S3 method for class 'cph': Survival(object, \dots) # Evaluate result as g(times, lp, stratum=1, type=c("step","polygon"))

## S3 method for class 'cph': Quantile(object, \dots) # Evaluate like h(q, lp, stratum=1, type=c("step","polygon"))

## S3 method for class 'cph': Mean(object, method=c("exact","approximate"), type=c("step","polygon"), n=75, tmax, ...) # E.g. m(lp, stratum=1, type=c("step","polygon"), tmax, ...)

an S formula object with a Surv object on the left-hand side. The terms can specify any S model formula with up to third-order interactions. The strat function may appear in the terms, as a main effect or an
an object created by cph with surv=TRUE
name of an S data frame containing all needed variables. Omit this to use a data frame already in the S ``search list''.
case weights
an expression defining a subset of the observations to use in the fit. The default is to use all observations. Specify for example age>50 & sex="male" or c(1:100,200:300) respectively to use the observations satisfy
specifies an S function to handle missing data. The default is the function na.delete, which causes observations with any variable missing to be deleted. The main difference between na.delete and the S-supplied function
for cph, specifies a particular fitting method, "model.frame" instead to return the model frame of the predictor and response variables satisfying any subset or missing value checks, or "model.matrix" to retu
If TRUE, the program will automatically skip over columns of the X matrix that are linear combinations of earlier columns. In this case the coefficients for such columns will be NA, and the variance matrix will contain zeros. Fo
if TRUE a robust variance estimate is returned. Default is TRUE if the model includes a cluster() operative, FALSE otherwise.
default is FALSE(false). Set to TRUE to return the model frame as element model of the fit object.
default is FALSE. Set to TRUE to return the expanded design matrix as element x (without intercept indicators) of the returned fit object.
default is FALSE. Set to TRUE to return the vector of response values (Surv object) as element y of the fit.
default is FALSE. Set to TRUE to compute the estimated standard errors of the estimate of X beta and store them in element se.fit of the fit. The predictors are first centered to their means before comp
convergence criterion - change in log likelihood.
vector of initial parameter estimates. Defaults to all zeros. Special residuals can be obtained by setting some elements of init to MLEs and others to zero and specifying iter.max=1.
maximum number of iterations to allow. Set to 0 to obtain certain null-model residuals.
tolerance for declaring singularity for matrix inversion (available only when survival5 or later package is in effect)
set to TRUE to compute underlying survival estimates for each stratum, and to store these along with standard errors of log Lambda(t), maxtime (maximum observed survival or censoring time), and surv.summary
time increment used in deriving surv.summary. Survival, number at risk, and standard error will be stored for t=0, time.inc, 2 time.inc, ..., maxtime, where maxtime is the maximum survival time over all
(for cph) applies if surv is TRUE or "summary". If type is omitted, the method consistent with method is used. See survfit.coxph (under survfit
see survfit.coxph
other arguments passed to coxph.fit from cph. Ignored by other functions.
a scalar or vector of times at which to evaluate the survival estimates
a scalar or vector of linear predictors (including the centering constant) at which to evaluate the survival estimates
a scalar stratum number or name (e.g., "sex=male") to use in getting survival probabilities
a scalar quantile or a vector of quantiles to compute
the number of points at which to evaluate the mean survival time, for method="approximate" in Mean.cph.
For Mean.cph, the default is to compute the overall mean (and produce a warning message if there is censoring at the end of follow-up). To compute a restricted mean life length, specify the truncation point as tmax. F

If there is any strata by covariable interaction in the model such that the mean X beta varies greatly over strata, method="approximate" may not yield very accurate estimates of the mean in Mean.cph.

For method="approximate" if you ask for an estimate of the mean for a linear predictor value that was outside the range of linear predictors stored with the fit, the mean for that observation will be NA.


  • For Survival, Quantile, or Mean, an S function is returned. Otherwise, in addition to what is listed below, formula/design information and the components maxtime, time.inc, units, model, x, y, se.fit are stored, the last 5 depending on the settings of options by the same names. The vectors or matrix stored if y=TRUE or x=TRUE have rows deleted according to subset and to missing data, and have names or row names that come from the data frame used as input data.
  • ntable with one row per stratum containing number of censored and uncensored observations
  • coefvector of regression coefficients
  • statsvector containing the named elements Obs, Events, Model L.R., d.f., P, Score, Score P, and R2.
  • varvariance/covariance matrix of coefficients
  • linear.predictorsvalues of predicted X beta for observations used in fit, normalized to have overall mean zero
  • residmartingale residuals
  • logliklog likelihood at initial and final parameter values
  • scorevalue of score statistic at initial values of parameters
  • timeslists of times (if surv="T")
  • survlists of underlying survival probability estimates
  • std.errlists of standard errors of estimate log-log survival
  • surv.summarya 3 dimensional array if surv=TRUE. The first dimension is time ranging from 0 to maxtime by time.inc. The second dimension refers to strata. The third dimension contains the time-oriented matrix with Survival, n.risk (number of subjects at risk), and std.err (standard error of log-log survival).
  • centercentering constant, equal to overall mean of X beta.

See Also

coxph, coxph.fit, Surv, residuals.cph, cox.zph, survfit.cph, survest.cph, survfit.coxph, survplot, datadist, rms, rms.trans, anova.rms, summary.rms, Predict, fastbw, validate, calibrate, plot.Predict, specs.rms, lrm, which.influence, na.delete, na.detail.response, print.cph, latex.cph, vif, ie.setup

  • cph
  • Survival.cph
  • Quantile.cph
  • Mean.cph
# Simulate data from a population model in which the log hazard
# function is linear in age and there is no age x sex interaction

n <- 1000
age <- 50 + 12*rnorm(n)
label(age) <- "Age"
sex <- factor(sample(c('Male','Female'), n, 
              rep=TRUE, prob=c(.6, .4)))
cens <- 15*runif(n)
h <- .02*exp(.04*(age-50)+.8*(sex=='Female'))
dt <- -log(runif(n))/h
label(dt) <- 'Follow-up Time'
e <- ifelse(dt <= cens,1,0)
dt <- pmin(dt, cens)
units(dt) <- "Year"
dd <- datadist(age, sex)
Srv <- Surv(dt,e)

f <- cph(Srv ~ rcs(age,4) + sex, x=TRUE, y=TRUE)
cox.zph(f, "rank")             # tests of PH
plot(Predict(f, age=., sex=.)) # plot age effect, 2 curves for 2 sexes
survplot(f, sex=.)             # time on x-axis, curves for x2
res <- resid(f, "scaledsch")
time <- as.numeric(dimnames(res)[[1]])
z <- loess(res[,4] ~ time, span=0.50)   # residuals for sex
plot(time, fitted(z))
lines(supsmu(time, res[,4]),lty=2)
plot(cox.zph(f,"identity"))    #Easier approach for last few lines
# latex(f)

f <- cph(Srv ~ age + strat(sex), surv=TRUE)
g <- Survival(f)   # g is a function
g(seq(.1,1,by=.1), stratum="sex=Male", type="poly") #could use stratum=2
med <- Quantile(f)
plot(Predict(f, age=., fun=function(x) med(lp=x)))  #plot median survival

# Fit a model that is quadratic in age, interacting with sex as strata
# Compare standard errors of linear predictor values with those from
# coxph
# Use more stringent convergence criteria to match with coxph

f <- cph(Srv ~ pol(age,2)*strat(sex), x=TRUE, eps=1e-9, iter.max=20)
se <- predict(f, se.fit=TRUE)$se.fit
xyplot(se ~ age | sex, main='From cph')
a <- c(30,50,70)
comb <- data.frame(age=rep(a, each=2),
                   sex=rep(levels(sex), 3))

p <- predict(f, comb, se.fit=TRUE)
comb$yhat  <- p$linear.predictors
comb$se    <- p$se.fit
z <- qnorm(.975)
comb$lower <- p$linear.predictors - z*p$se.fit
comb$upper <- p$linear.predictors + z*p$se.fit

age2 <- age^2
f2 <- coxph(Srv ~ (age + age2)*strata(sex))
se <- predict(f2, se.fit=TRUE)$se.fit
xyplot(se ~ age | sex, main='From coxph')
comb <- data.frame(age=rep(a, each=2), age2=rep(a, each=2)^2,
                   sex=rep(levels(sex), 3))
p <- predict(f2, newdata=comb, se.fit=TRUE)
comb$yhat <- p$fit
comb$se   <- p$se.fit
comb$lower <- p$fit - z*p$se.fit
comb$upper <- p$fit + z*p$se.fit

# g <- cph(Surv(hospital.charges) ~ age, surv=TRUE)
# Cox model very useful for analyzing highly skewed data, censored or not
# m <- Mean(g)
# m(0)                           # Predicted mean charge for reference age

#Fit a time-dependent covariable representing the instantaneous effect
#of an intervening non-fatal event
dframe <- data.frame(failure.time=1:10, event=rep(0:1,5),
z <- ie.setup(dframe$failure.time, dframe$event, dframe$ie.time)
S <- z$S
ie.status <- z$ie.status
attach(dframe[z$subs,])    # replicates all variables

f <- cph(S ~ age + ie.status, x=TRUE, y=TRUE)  
#Must use x=TRUE,y=TRUE to get survival curves with time-dep. covariables

#Get estimated survival curve for a 50-year old who has an intervening
#non-fatal event at 5 days
new <- data.frame(S=Surv(c(0,5), c(5,999), c(FALSE,FALSE)), age=rep(50,2),
g <- survfit(f, new)
plot(c(0,g$time), c(1,g$surv[,2]), type='s', 
     xlab='Days', ylab='Survival Prob.')
# Not certain about what columns represent in g$surv for survival5
# but appears to be for different ie.status
#g <- survest(f, new)
#plot(g$time, g$surv, type='s', xlab='Days', ylab='Survival Prob.')

#Compare with estimates when there is no intervening event
new2 <- data.frame(S=Surv(c(0,5), c(5, 999), c(FALSE,FALSE)), age=rep(50,2),
g2 <- survfit(f, new2)
lines(c(0,g2$time), c(1,g2$surv[,2]), type='s', lty=2)
#g2 <- survest(f, new2)
#lines(g2$time, g2$surv, type='s', lty=2)
detach("dframe[z$subs, ]")
Documentation reproduced from package rms, version 2.0-2, License: GPL (>= 2)

Community examples

Looks like there are no examples yet.