flexPM: Flexible Modeling of Survival Data

Description

Fit a flexible parametric regression model to possibly right-censored, left-truncated data.

Usage

flexPM(formula, data, weights, df = 3, degree = 3, knots, maxit, tol = 1e-6)

Arguments

formula

an object of class “formula”: a symbolic description of the model to be fitted. The response must be a survival object as returned by the Surv function. See ‘Details’.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If missing, the variables are taken from environment(formula).

weights

optional weights to be used during the fitting process.

the degrees of freedom of the B-spline that describes $u(T)$ (see ‘Details’).

degree

the degree of the polynomial of the B-spline that describes $u(T)$.

knots

the internal knots of the B-spline that describes $u(T)$.

maxit

maximum number of iterations of the Newton-Raphson algorithm. If missing, a default value based on the number of free parameters is used.

tol

tolerance for the Newton-Raphson algorithm.

Value

converged: logical value indicating the convergence status of the algorithm.
n.it: the number of iterations.
n: the number of observations.
n.free.par: the number of free parameters in the model.
loglik: the values of the log-likelihood at convergence.
AIC, BIC: the Akaike and Bayesian information criterion.
mf: the used model frame.
call: the matched call.

Details

This function fits a flexible parametric model to possibly right-censored, left-truncated outcomes, usually survival data. Right censoring occurs when instead of some outcome $T$, one can only observe $Y = min(T,C)$ and $d = I(T \le C)$, the indicator of failure. Left truncation occurs when $Y$ is only observed subject to $Y > Z$. Typically, $Z$ is the time at enrollement of subjects with a disease. Those who died before enrollment ($Y < Z$) cannot be observed, thus generating selection bias.

The formula must be of the form

Surv(y,d) ~ x, with censored data;
Surv(z,y,d) ~ x, with censored, truncated data;
Surv(y) ~ x, is also allowed and denotes non-censored, non-truncated data.

In the above, x is a set of predictors, y is the response variable, z truncation times (z < y), and d the indicator of failure (1 = event, 0 = censored).

In flexPM, model fitting is implemented as follows. First, the response variable is pre-trasformed using a smoothed version of y = qlogis(rank(y)/(n + 1)). Second, parameter estimation is carried out on the transformed variable. Maximum likelihood estimators are computed via Newton-Raphson algorithm, using the following flexible distribution: $$F_T(t \mid x) = \frac{1}{1 + e^{-\frac{u(t) - m(x)}{s(x)}}}.$$ In the above, $m(x)$ and $log s(x)$ are modeled as specified by formula, while $u(.)$ is a B-spline function built via spline.des (see bs). You can choose the degrees of freedom df and the degree of the spline basis. The model parameters are (a) the coefficients describing the effect of covariates $x$ on $m(x)$ and $log s(x)$, and (b) the coefficients of the B-spline basis that defines the unknown transformation $u(.)$, on which suitable constraints are imposed to ensure monotonicity.

Examples

Run this code

# Simulated data from a normal distribution 

n <- 1000
x1 <- rnorm(n)
x2 <- runif(n)


# non-censored, non-truncated data

t <- rnorm(n, 2 + 3*x1, 1 + x2) # time variable
m1 <- flexPM(Surv(t) ~ x1 + x2)


# right-censored data

c <- rnorm(n,3,3) # censoring variable
y <- pmin(t,c)    # observed outcome
d <- (t <= c)     # 1 = observed, 0 = censored
m2 <- flexPM(Surv(y,d) ~ x1 + x2)


# right-censored, left-truncated data

z <- rnorm(n,-3,3) # truncating variable
w <- which(y > z)  # only observe if y > z
y <- y[w]; d <- d[w]; z <- z[w]; x1 <- x1[w]; x2 <- x2[w]
m3 <- flexPM(Surv(z,y,d) ~ x1 + x2)

################################################################

# m1, m2, m3 are not intended to be interpreted.
# Use predict() to obtain predictions.

# Note that the following are identical:
# summary(flexPM(Surv(y) ~ x1 + x2))
# summary(flexPM(Surv(y, rep(1,length(y))) ~ x1 + x2))
# summary(flexPM(Surv(rep(-Inf,length(y)), y, rep(1,length(y))) ~ x1 + x2))

################################################################

# Use the logLik, AIC and BIC for model selection 
# (choice of df, inclusion/exclusion of covariates)

models <- list(
  flexPM(Surv(z,y,d) ~ x1 + x2, df = 1, degree = 1),
  flexPM(Surv(z,y,d) ~ x1 + x2, df = 3),
  flexPM(Surv(z,y,d) ~ x1 + x2 + I(x1^2) + I(x2^2), df = 1, degree = 1),
  flexPM(Surv(z,y,d) ~ x1 + x2 + I(x1^2) + I(x2^2), df = 3),
  flexPM(Surv(z,y,d) ~ x1 * x2, df = 5)
)

my_final_model <- models[[which.min(sapply(models, function(x) x$AIC))]]
summary(my_final_model)