Learn R Programming

flexPM (version 2.0)

flexPM: Flexible Modeling of Survival Data

Description

Fit a flexible parametric regression model to possibly right-censored, left-truncated data.

Usage

flexPM(formula, data, weights, df = 3, degree = 3, knots, maxit, tol = 1e-6)

Arguments

formula
an object of class “formula”: a symbolic description of the model to be fitted. The response must be a survival object as returned by the Surv function. See ‘Details’.

data
an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If missing, the variables are taken from environment(formula).
weights
optional weights to be used during the fitting process.
df
the degrees of freedom of the B-spline that describes $u(T)$ (see ‘Details’).
degree
the degree of the polynomial of the B-spline that describes $u(T)$.
knots
the internal knots of the B-spline that describes $u(T)$.
maxit
maximum number of iterations of the Newton-Raphson algorithm. If missing, a default value based on the number of free parameters is used.
tol
tolerance for the Newton-Raphson algorithm.

Value

An object of class “flexPM”, which is a list with the following items:
converged
logical value indicating the convergence status of the algorithm.
n.it
the number of iterations.
n
the number of observations.
n.free.par
the number of free parameters in the model.
loglik
the values of the log-likelihood at convergence.
AIC, BIC
the Akaike and Bayesian information criterion.
mf
the used model frame.
call
the matched call.
The model parameters are returned as attributes of mf and are not user-level objects. The model is intended to be used for prediction and not for inference. The hessian matrix is not returned. The number of free parameters is df + 2*ncol(x) - 1, and not df + 2*ncol(x), because the scale of $u(.)$ and that of $F(t|x)$ are exchangeable and thus one coefficient of $u(.)$ is constrained to be 1.The accessor functions summary, nobs, logLik, AIC, and BIC can be used to extract information from the model. The fit is only intended for prediction: use predict.flexPM.

Details

This function fits a flexible parametric model to possibly right-censored, left-truncated outcomes, usually survival data. Right censoring occurs when instead of some outcome $T$, one can only observe $Y = min(T,C)$ and $d = I(T \le C)$, the indicator of failure. Left truncation occurs when $Y$ is only observed subject to $Y > Z$. Typically, $Z$ is the time at enrollement of subjects with a disease. Those who died before enrollment ($Y < Z$) cannot be observed, thus generating selection bias.

The formula must be of the form

  • Surv(y,d) ~ x, with censored data;
  • Surv(z,y,d) ~ x, with censored, truncated data;
  • Surv(y) ~ x, is also allowed and denotes non-censored, non-truncated data.

In the above, x is a set of predictors, y is the response variable, z truncation times (z < y), and d the indicator of failure (1 = event, 0 = censored).

In flexPM, model fitting is implemented as follows. First, the response variable is pre-trasformed using a smoothed version of y = qlogis(rank(y)/(n + 1)). Second, parameter estimation is carried out on the transformed variable. Maximum likelihood estimators are computed via Newton-Raphson algorithm, using the following flexible distribution: $$F_T(t \mid x) = \frac{1}{1 + e^{-\frac{u(t) - m(x)}{s(x)}}}.$$ In the above, $m(x)$ and $log s(x)$ are modeled as specified by formula, while $u(.)$ is a B-spline function built via spline.des (see bs). You can choose the degrees of freedom df and the degree of the spline basis. The model parameters are (a) the coefficients describing the effect of covariates $x$ on $m(x)$ and $log s(x)$, and (b) the coefficients of the B-spline basis that defines the unknown transformation $u(.)$, on which suitable constraints are imposed to ensure monotonicity.

See Also

predict.flexPM

Examples

Run this code
# Simulated data from a normal distribution 

n <- 1000
x1 <- rnorm(n)
x2 <- runif(n)


# non-censored, non-truncated data

t <- rnorm(n, 2 + 3*x1, 1 + x2) # time variable
m1 <- flexPM(Surv(t) ~ x1 + x2)


# right-censored data

c <- rnorm(n,3,3) # censoring variable
y <- pmin(t,c)    # observed outcome
d <- (t <= c)     # 1 = observed, 0 = censored
m2 <- flexPM(Surv(y,d) ~ x1 + x2)


# right-censored, left-truncated data

z <- rnorm(n,-3,3) # truncating variable
w <- which(y > z)  # only observe if y > z
y <- y[w]; d <- d[w]; z <- z[w]; x1 <- x1[w]; x2 <- x2[w]
m3 <- flexPM(Surv(z,y,d) ~ x1 + x2)

################################################################

# m1, m2, m3 are not intended to be interpreted.
# Use predict() to obtain predictions.

# Note that the following are identical:
# summary(flexPM(Surv(y) ~ x1 + x2))
# summary(flexPM(Surv(y, rep(1,length(y))) ~ x1 + x2))
# summary(flexPM(Surv(rep(-Inf,length(y)), y, rep(1,length(y))) ~ x1 + x2))

################################################################

# Use the logLik, AIC and BIC for model selection 
# (choice of df, inclusion/exclusion of covariates)

models <- list(
  flexPM(Surv(z,y,d) ~ x1 + x2, df = 1, degree = 1),
  flexPM(Surv(z,y,d) ~ x1 + x2, df = 3),
  flexPM(Surv(z,y,d) ~ x1 + x2 + I(x1^2) + I(x2^2), df = 1, degree = 1),
  flexPM(Surv(z,y,d) ~ x1 + x2 + I(x1^2) + I(x2^2), df = 3),
  flexPM(Surv(z,y,d) ~ x1 * x2, df = 5)
)

my_final_model <- models[[which.min(sapply(models, function(x) x$AIC))]]
summary(my_final_model)

Run the code above in your browser using DataLab