flexsurvreg(formula, anc=NULL, data, weights, bhazard, subset, na.action, dist, inits, fixedpars=NULL, dfns=NULL, aux=NULL, cl=0.95, integ.opts, sr.control=survreg.control(), ...)Surv function, and any covariates are given on the
right-hand side. For example, Surv(time, dead) ~ age + sex
Surv objects of type="right","counting",
"interval1" or "interval2" are supported, corresponding to
right-censored, left-truncated or interval-censored observations.
If there are no covariates, specify 1 on the right hand side,
for example Surv(time, dead) ~ 1.
By default, covariates are placed on the ``location'' parameter of
the distribution, typically the "scale" or "rate" parameter, through
a linear model, or a log-linear model if this parameter must be
positive. This gives an accelerated failure time model or a
proportional hazards model (see dist below) depending on how the
distribution is parameterised.
Covariates can be placed on other (``ancillary'') parameters by using
the name of the parameter as a ``function'' in the formula. For
example, in a Weibull model, the following expresses the scale
parameter in terms of age and a treatment variable treat, and
the shape parameter in terms of sex and treatment.
Surv(time, dead) ~ age + treat + shape(sex) + shape(treat)
However, if the names of the ancillary parameters clash with any
real functions that might be used in formulae (such as I(), or
factor()), then those functions will not work in the
formula. A safer way to model covariates on ancillary parameters is
through the anc argument to flexsurvreg.
survreg users should also note that the function
strata() is ignored, so that any covariates surrounded by
strata() are applied to the location parameter.
Surv(time, dead) ~ age + treat, anc = list(shape = ~ sex + treat)
formula.
If not given, the variables should be in the working environment.
options()$na.action.
"gengamma" |
Generalized gamma (stable) | mu |
| AFT |
"gengamma.orig" |
Generalized gamma (original) |
| scale | AFT |
"genf" |
| Generalized F (stable) | mu | AFT |
"genf.orig" |
Generalized F (original) | mu |
| AFT |
"weibull" |
Weibull |
| scale | AFT |
"gamma" |
| Gamma | rate | AFT |
"exp" |
Exponential | rate |
| PH |
"llogis" |
Log-logistic |
| scale | AFT |
"lnorm" |
| Log-normal | meanlog | AFT |
"gompertz" |
Gompertz | rate |
| PH |
"gengamma" |
Generalized gamma (stable) |
"exponential" and "lognormal" can be used as aliases
for "exp" and "lnorm", for compatibility with
survreg.
Alternatively, dist can be a list specifying a custom
distribution. See section ``Custom distributions'' below for how to
construct this list.
Very flexible spline-based distributions can also be fitted with
flexsurvspline.
The parameterisations of the built-in distributions used here are
the same as in their built-in distribution functions:
dgengamma, dgengamma.orig,
dgenf, dgenf.orig,
dweibull, dgamma, dexp,
dlnorm, dgompertz, respectively. The
functions in base R are used where available, otherwise, they are
provided in this package.
For the Weibull, exponential and log-normal distributions,
flexsurvreg simply works by calling
survreg to obtain the maximum likelihood estimates,
then calling optim to double-check convergence and
obtain the covariance matrix for flexsurvreg's
preferred parameterisation.
The Weibull parameterisation is
different from that in survreg, instead it
is consistent with dweibull. The "scale"
reported by survreg is equivalent to
1/shape as defined by dweibull and hence
flexsurvreg. The first coefficient (Intercept)
reported by survreg is equivalent to
log(scale) in dweibull and
flexsurvreg.
Similarly in the exponential distribution, the rate, rather than the mean, is modelled on covariates.
flexsurv.dists in the source for the exact methods used.
If the likelihood surface may be uneven, it is advised to run
the optimisation starting from various different initial values
to ensure convergence to the true global maximum.
inits.
For example, in a stable
generalized Gamma model with two covariates, to fix the third
of three generalized gamma parameters (the shape Q,
see the help for GenGamma) and the second covariate,
specify fixedpars = c(3, 5)
"d", "p", "h",
or "H" containing the probability density, cumulative
distribution, hazard, or cumulative hazard functions of the
distribution. For example, list(d=dllogis, p=pllogis).
If dfns is used, a custom dlist must still be
provided, but dllogis and pllogis need not be visible
from the global environment. This is useful if flexsurvreg
is called within other functions or environments where the
distribution functions are also defined dynamically.
flexsurvspline to supply the knot locations and
modelling scale (e.g. hazard or odds). This cannot be used to fix
parameters of a distribution --- use fixedpars for that.
integrate, if a custom density or hazard is provided without its
cumulative version. For example, integ.opts = list(rel.tol=1e-12)
optim. For example, the BFGS
optimisation algorithm is the default in flexsurvreg,
but this can be changed, for example to method="Nelder-Mead" which can be
more robust to poor initial values.
If the optimisation fails to converge, consider normalising the
problem using, for example, control=list(fnscale = 2500), for
example, replacing 2500 by a number of the order of magnitude of the
likelihood. If 'false' convergence is reported with a non-positive-definite Hessian,
then consider tightening the tolerance criteria for convergence. If
the optimisation takes a long time, intermediate steps can be
printed using the trace argument of the control list. See
optim for details.
"flexsurvreg" containing information about the
fitted model. Components of interest to users may include:coef,
vcov and confint methods for
flexsurvreg objects work on this scale.vcov.model.frame.flexsurvreg or model.matrix.flexsurvreg.flexsurvreg is intended to be easy to extend to handle
new distributions. To define a new distribution for use in
flexsurvreg, construct a list with the following
elements: name:"dist", for example, then there must be
visible in the working environment, at least, either a) a function called ddist which defines the probability
density, or b) a function called hdist which defines the hazard. Ideally, in case a) there should also be a function called
pdist which defines the probability distribution or
cumulative density, and in case b) there should be a function
called Hdist defining the cumulative hazard. If these
additional functions are not provided, flexsurv attempts to
automatically create them by numerically integrating the density
or hazard function. However, model fitting will be much slower,
or may not even work at all, if the analytic versions of these
functions are not available. The functions must accept vector arguments (representing different
times, or alternative values for each parameter) and return the results
as a vector. The function Vectorize may be helpful
for doing this: see the example below. These functions may be in an add-on package (see below for an
example) or may be user-written. If they are user-written they
must be defined in the global environment, or supplied explicitly
through the dfns argument to flexsurvreg.
The latter may be useful if the functions are created dynamically
(as in the source of flexsurvspline) and thus not visible
through R's scoping rules. Arguments other than parameters must be named in the conventional
way -- for example x for the first argument of the density
function or hazard, as in dnorm(x, ...) and q
for the first argument of the probability function. Density
functions should also have an argument log, after the
parameters, which when TRUE, computes the log density,
using a numerically stable additive formula if possible. Additional functions with names beginning with "DLd" and "DLS" may be
defined to calculate the derivatives of the
log density and log survival probability, with respect to the
parameters of the distribution. The parameters are expressed on
the real line, for example after log transformation if they are
defined as positive. The first argument must be named
t, representing the time, and the remaining arguments must be
named as the parameters of the density function. The function must return a
matrix with rows corresponding to times, and columns
corresponding to the parameters of the distribution. The derivatives are used, if
available, to speed up the model fitting with optim.
pars:location:formula supplied to
flexsurvreg. transforms:c(log, log) for a distribution with two positive parameters. inv.transforms:c(exp) or list(exp). inits:t
(including right-censoring times, and using the halfway point for
interval-censored times)
which returns a vector of reasonable initial values for maximum
likelihood estimation of each parameter. For example,
function(t){ c(1, mean(t)) } will always initialize the first
of two parameters at 1, and the second (a scale
parameter, for instance) at the mean of t.
dEV and pEV. See the
Examples below for the custom list in this case, and the
subsequent command to fit the model.optim function.
Parameters defined to be positive are estimated on the log scale.
Confidence intervals are estimated from the Hessian at the maximum,
and transformed back to the original scale of the parameters. The usage of flexsurvreg is intended to be similar to
survreg in the survival package.
Cox, C. (2008) The generalized $F$ distribution: An umbrella for parametric survival analysis. Statistics in Medicine 27:4301-4312.
Cox, C., Chu, H., Schneider, M. F. and Muñoz, A. (2007) Parametric survival analysis and taxonomy of hazard functions for the generalized gamma distribution. Statistics in Medicine 26:4252-4374
Jackson, C. H. and Sharples, L. D. and Thompson, S. G. (2010) Survival models in health economic evaluations: balancing fit and parsimony to improve prediction. International Journal of Biostatistics 6(1):Article 34.
flexsurvspline for flexible survival modelling using the
spline model of Royston and Parmar. plot.flexsurvreg and lines.flexsurvreg to
plot fitted survival, hazards and cumulative hazards from models fitted
by flexsurvreg and flexsurvspline.
data(ovarian)
## Compare generalized gamma fit with Weibull
fitg <- flexsurvreg(formula = Surv(futime, fustat) ~ 1, data = ovarian, dist="gengamma")
fitg
fitw <- flexsurvreg(formula = Surv(futime, fustat) ~ 1, data = ovarian, dist="weibull")
fitw
plot(fitg)
lines(fitw, col="blue", lwd.ci=1, lty.ci=1)
## Identical AIC, probably not enough data in this simple example for a
## very flexible model to be worthwhile.
## Custom distribution
## make "dEV" and "pEV" from eha package (if installed)
## available to the working environment
if (require("eha")) {
custom.ev <- list(name="EV",
pars=c("shape","scale"),
location="scale",
transforms=c(log, log),
inv.transforms=c(exp, exp),
inits=function(t){ c(1, median(t)) })
fitev <- flexsurvreg(formula = Surv(futime, fustat) ~ 1, data = ovarian,
dist=custom.ev)
fitev
lines(fitev, col="purple", col.ci="purple")
}
## Custom distribution: supply the hazard function only
hexp2 <- function(x, rate=1){ rate } # exponential distribution
hexp2 <- Vectorize(hexp2)
custom.exp2 <- list(name="exp2", pars=c("rate"), location="rate",
transforms=c(log), inv.transforms=c(exp),
inits=function(t)1/mean(t))
flexsurvreg(Surv(futime, fustat) ~ 1, data = ovarian, dist=custom.exp2)
flexsurvreg(Surv(futime, fustat) ~ 1, data = ovarian, dist="exp")
## should give same answer
Run the code above in your browser using DataLab