markov_msm: Predictions for continuous time, nonhomogeneous Markov multi-state models using parametric and penalised survival models.

Description

A numerically efficient algorithm to calculate predictions from a continuous time, nonhomogeneous Markov multi-state model. The main inputs are the models for the transition intensities, the initial values, the transition matrix and the covariate patterns. The predictions include state occupancy probabilities (possibly with discounting and utilities), length of stay and costs. Standard errors are calculated using the delta method. Includes, differences, ratios and standardisation.

Usage

markov_msm(x, trans, t = c(0,1), newdata = NULL, init=NULL,
              tmvar = NULL, 
              sing.inf = 1e+10, method="adams", rtol=1e-10, atol=1e-10, slow=FALSE,
              min.tm=1e-8,
              utility=function(t) rep(1, nrow(trans)),
              utility.sd=rep(0,nrow(trans)),
              use.costs=FALSE,
              transition.costs=function(t) rep(0, sum(!is.na(trans))), # per transition
              transition.costs.sd=rep(0,sum(!is.na(trans))),
              state.costs=function(t) rep(0,nrow(trans)), # per unit time
              state.costs.sd=rep(0,nrow(trans)),
              discount.rate = 0,
              block.size=500,
              debug=FALSE,
              …)
# S3 method for markov_msm
vcov(object, …)
# S3 method for markov_msm
as.data.frame(x, row.names=NULL, optional=FALSE,
                                   ci=TRUE,
                                   P.conf.type="logit", L.conf.type="log",
				   C.conf.type="log",
                                   P.range=c(0,1), L.range=c(0,Inf),
				   C.range=c(0,Inf),
                                   …)
# S3 method for markov_msm_diff
as.data.frame(x, row.names=NULL, optional=FALSE,
                                   P.conf.type="plain", L.conf.type="plain",
				   C.conf.type="plain",
                                   P.range=c(-Inf,Inf), L.range=c(-Inf,Inf),
				   C.range=c(-Inf,Inf),
                                   …)
# S3 method for markov_msm_ratio
as.data.frame(x, row.names=NULL, optional=FALSE, ...)
standardise(x, …)
# S3 method for markov_msm
standardise(x,
                                 weights = rep(1,nrow(x$newdata)),
                                 normalise = TRUE, …)
# S3 method for markov_msm
plot(x, y, stacked=TRUE, which=c('P','L'),
                          xlab="Time", ylab=NULL, col=2:6, border=col,
                          ggplot2=FALSE, lattice=FALSE, alpha=0.2,
                          strata=NULL,
                          …)
# S3 method for markov_msm
subset(x, subset, …)
# S3 method for markov_msm
diff(x, y, …)
ratio_markov_msm(x, y, …)
# S3 method for markov_msm
rbind(…, deparse.level=1)
# S3 method for markov_msm
transform(`_data`, …)
collapse_markov_msm(object, which=NULL, sep="; ")
zeroModel(object)
hrModel(object,hr=1,ci=NULL,seloghr=NULL)
aftModel(object,af=1,ci=NULL,selogaf=NULL)
addModel(…)
hazFun(f, tmvar="t", …)
splineFun(time,rate,method="natural",scale=1,…)

Arguments

list of functions or parametric or penalised survival models. Currently the models include combinations of stpm2, pstpm2, glm, gam, survPen or an object of class "zeroModel" from zeroModel based on one of the other classes. The order in the list matches the indexing in the trans argument. The functions can optionally use a t argument for time and/or a newdata argument. Uncertainty in the models are incorporated into the gradients, while uncertainty in the functions are currently not modelled.

trans

Transition matrix describing the states and transitions in the multi-state model. If S is the number of states in the multi-state model, trans should be an S x S matrix, with (i,j)-element a positive integer if a transition from i to j is possible in the multi-state model, NA otherwise. In particular, all diagonal elements should be NA. The integers indicating the possible transitions in the multi-state model should be sequentially numbered, 1,…,K, with K the number of transitions. See msprep

numerical vector for the times to evaluation the predictions. Includes the start time

newdata

data.frame of the covariates to use in the predictions

init

vector of the initial values with the same length as the number of states. Defaults to the first state having an initial value of 1 (i.e. "[<-"(rep(0,nrow(trans)),1,1)).

tmvar

specifies the name of the time variable. This should be set for regression models that do not specify this (e.g. glm) or where the time variable is ambiguous

sing.inf

If there is a singularity in the observed hazard, for example a Weibull distribution with shape < 1 has infinite hazard at t=0, then as a workaround, the hazard is assumed to be a large finite number, sing.inf, at this time. The results should not be sensitive to the exact value assumed, but users should make sure by adjusting this parameter in these cases.

method

For markov_msm, the method used by the ordinary differential equation solver. Defaults to Adams method ("adams") for non-stiff differential equations.

For splineFun, the method jused for spline interpolation; see splinefun.

rtol

relative error tolerance, either a scalar or an array as long as the number of states. Passed to lsode

atol

absolute error tolerance, either a scalar or an array as long as the number of states. Passed to lsode

slow

logical to show whether to use the slow R-only implementation. Useful for debugging. Currently needed for costs.

min.tm

Minimum time used for evaluations. Avoids log(0) for some models.

utility

a function of the form function(t) that returns a utility for each state at time t for the length of stay values

utility.sd

a function of the form function(t) that returns the standard deviation for the utility for each state at time t for the length of stay values

use.costs

logical for whether to use costs. Default: FALSE

transition.costs

a function of the form function(t) that returns the cost for each transition

transition.costs.sd

a function of the form function(t) that returns the standard deviation for the cost for each transition

state.costs

a function of the form function(t) that returns the cost per unit time for each state

state.costs.sd

a function of the form function(t) that returns the standard deviation for the cost per unit time for each state

discount.rate

numerical value for the proportional reduction (per unit time) in the length of stay and costs

block.size

divide newdata into blocks. Uses less memory but is slower. Reduce this number if the function call runs out of memory.

debug

logical flag for whether to keep the full output from the ordinary differential equation in the res component (default=FALSE).

…

other arguments. For markov_msm, these are passed to the ode solver from the deSolve package. For plot.markov_msm, these arguments are passed to plot.default

row.names

add in row names to the output data-frame

optional

(not currently used)

logical for whether to include confidence intervals. Default: TRUE

P.conf.type

type of transformation for the confidence interval calculation for the state occupancy probabilities. Default: log-log transformation. This is changed for diff and ratio_markov_msm objects

L.conf.type

type of transformation for the confidence interval calculation for the length of stay calculation. Default: log transformation. This is changed for diff and ratio_markov_msm objects

C.conf.type

type of transformation for the confidence interval calculation for the length of stay calculation. Default: log transformation. This is changed for diff and ratio_markov_msm objects

P.range

valid values for the state occupancy probabilities. Default: (0,1). This is changed for diff and ratio_markov_msm objects

L.range

valid values for the state occupancy probabilities. Default: (0,Inf). This is changed for diff and ratio_markov_msm objects

C.range

valid values for the state occupancy probabilities. Default: (0,Inf). This is changed for diff and ratio_markov_msm objects

weights

numerical vector to use in standardising the state occupancy probabilities, length of stay and costs. Default: 1 for each observation.

normalise

logical for whether to normalise the weights to 1. Default: TRUE

(currently ignored)

stacked

logical for whether to stack the plots. Default: TRUE

xlab

x-axis label

ylab

x-axis label

col

colours (ignored if ggplot2=TRUE)

border

border colours for the polygon (ignored if ggplot=TRUE)

ggplot2

use ggplot2

alpha

alpha value for confidence bands (ggplot)

lattice

use lattice

strata

formula for the stratification factors for the plot

subset

expression that is evaluated on the newdata component of the object to filter (or restrict) for the covariates used for predictions

_data

an object of class "markov_msm"

deparse.level

not currently used

which

either an index of the states to collapse or a character vector of the state names to collapse

sep

separator to use for the collapsed state names

object

survival regression object to be wrapped

hazard ratio

seloghr

alternative specification for the se of the log(hazard ratio); see also ci argument

acceleration factor

selogaf

alternative specification for the se of the log(acceleration factor); see also ci argument

rate function, possibly with tmvar and/or newdata as arguments

time

exact times

rate

rates as per time

scale

rate multiplier (e.g. scale=365.25 for converting from daily rates to yearly rates)

Value

markov_msm returns an object of class "markov_msm".

The function summary is used to obtain and print a summary and analysis of variance table of the results. The generic accessor functions coef and vcov extract various useful features of the value returned by markov_msm.

An object of class "markov_msm" is a list containing at least the following components:

time

a numeric vector with the times for the predictions

an array for the predicted state occupancy probabilities. The array has three dimensions: time, state, and observations.

an array for the predicted sojourn times (or length of stay). The array has three dimensions: time, state, and observations.

an array for the partial derivatives of the predicted state occupancy probabilities with respect to the model coefficients. The array has four dimensions: time, state, coefficients, and observations.

an array for the partial derivatives of the predicted sojourn times (or length of stay) with respect to the model coefficients. The array has four dimensions: time, state, coefficients, and observations.

newdata

a data.frame with the covariates used for the predictions

vcov

the variance-covariance matrix for the models of the transition intensities

trans

copy of the trans input argument

call

the call to the function

For debugging:

res

data returned from the ordinary differential equation solver. This may include more information on the predictions

Details

The predictions are calculated using an ordinary differential equation solver. The algorithm uses a single run of the solver to calculate the state occupancy probabilities, length of stay, costs and their partial derivatives with respect to the model parameters. The predictions can also be combined to calculate differences, ratios and standardised.

The current implementation supports a list of models for each transition.

The current implementation also only allows for a vector of initial values rather than a matrix. The predictions will need to be re-run for different vectors of initial values.

For as.data.frame.markov_msm_ratio, the data are provided in log form, hence the default transformations and bounds are as per as.data.frame.markov_msm_diff, with untransformed data on the real line.

TODO: allow for one model to predict for the different transitions.

Examples

Run this code

# NOT RUN {
## Crowther and Lambert (2019)
library(readstata13)
library(mstate)
library(ggplot2)
mex.1 <- read.dta13("http://fmwww.bc.edu/repec/bocode/m/multistate_example.dta")
transmat <- rbind("Post-surgery"=c(NA,1,2), 
                  "Relapsed"=c(NA,NA,3),
                  "Died"=c(NA,NA,NA))
colnames(transmat) <- rownames(transmat)
mex.2 <- transform(mex.1,osi=(osi=="deceased")+0)
levels(mex.2$size)[2] <- ">20-50 mm" # fix typo
mex <- mstate::msprep(time=c(NA,"rf","os"),status=c(NA,"rfi","osi"),
                      data=mex.2,trans=transmat,id="pid",
                      keep=c("age","size","nodes","pr_1","hormon"))
mex <- transform(mex,
                 size2=(unclass(size)==2)+0, # avoids issues with TRUE/FALSE
                 size3=(unclass(size)==3)+0,
                 hormon=(hormon=="yes")+0,
                 Tstart=Tstart/12,
                 Tstop=Tstop/12)
##
c.ar <- stpm2(Surv(Tstart,Tstop,status) ~ age + size2 + size3 + nodes + pr_1 + hormon,
              data = mex, subset=trans==1, df=3, tvc=list(size2=1,size3=1,pr_1=1))
c.ad <- stpm2(Surv(Tstart, Tstop, status) ~ age + size + nodes + pr_1 + hormon,
              data = mex, subset=trans==2, df=1)
c.rd <- stpm2( Surv(Tstart,Tstop,status) ~ age + size + nodes + pr_1 + hormon,
              data=mex, subset=trans==3, df=3, tvc=list(pr_1=1))
##
nd <- expand.grid(nodes=seq(0,20,10), size=levels(mex$size))
nd <- transform(nd, age=54, pr_1=3, hormon=0,
                size2=(unclass(size)==2)+0,
                size3=(unclass(size)==3)+0)
## Predictions
system.time(pred1 <- rstpm2::markov_msm(list(c.ar,c.ad,c.rd), t = seq(0,15,length=301),
                                        newdata=nd, trans = transmat)) # ~15 seconds
pred1 <- transform(pred1, Nodes=paste("Nodes =",nodes), Size=paste("Size",size))
## Figure 3
plot(pred1, ggplot=TRUE) + facet_grid(Nodes ~ Size) + xlab("Years since surgery")
plot(pred1, strata=~nodes+size, xlab="Years since surgery", lattice=TRUE)
## Figure 4
plot(subset(pred1, nodes==0 & size=="<=20 mm"), stacked=FALSE, ggplot=TRUE) +
    facet_grid(. ~ state) +
    xlab("Years since surgery")
## Figure 5
a <- diff(subset(pred1,nodes==0 & size=="<=20 mm"),
          subset(pred1,nodes==0 & size==">20-50 mm"))
a <- transform(a, label = "Prob(Size<=20 mm)-Prob(20mm<Size<50mm)")
b <- ratio_markov_msm(subset(pred1,nodes==0 & size=="<=20 mm"),
                      subset(pred1,nodes==0 & size==">20-50 mm"))
b <- transform(b,label="Prob(Size<=20 mm)-Prob(20mm<Size<50mm)")
##
c <- diff(subset(pred1,nodes==0 & size=="<=20 mm"),
          subset(pred1,nodes==0 & size==">50 mm"))
c <- transform(c, label = "Prob(Size<=20 mm)-Prob(Size>=50mm)")
d <- ratio_markov_msm(subset(pred1,nodes==0 & size=="<=20 mm"),
                      subset(pred1,nodes==0 & size==">50 mm"))
d <- transform(d,label= "Prob(Size<=20 mm)-Prob(Size>=50mm)")
##
e <- diff(subset(pred1,nodes==0 & size==">20-50 mm"),
          subset(pred1,nodes==0 & size==">50 mm"))
e <- transform(e,label="Prob(20mm<Size<50 mm)-Prob(Size>=50mm)")
f <- ratio_markov_msm(subset(pred1,nodes==0 & size==">20-50 mm"),
                      subset(pred1,nodes==0 & size==">50 mm"))
f <- transform(f, label = "Prob(20mm<Size<50 mm)-Prob(Size>=50mm)")
## combine
diffs <- rbind(a,c,e)
ratios <- rbind(b,d,f)
## Figure 5
plot(diffs, stacked=FALSE, ggplot2=TRUE) + xlab("Years since surgery") +
    ylim(c(-0.4, 0.4)) + facet_grid(label ~ state)
##
plot(ratios, stacked=FALSE, ggplot2=TRUE) + xlab("Years since surgery") +
    ylim(c(0, 3)) + facet_grid(label ~ state)
## Figure 6
plot(subset(pred1, nodes==0 & size=="<=20 mm"), stacked=FALSE, which="L", ggplot2=TRUE) +
    facet_grid(. ~ state) + xlab("Years since surgery")
## Figure 7
plot(diffs, stacked=FALSE, which="L", ggplot2=TRUE) + xlab("Years since surgery") +
    ylim(c(-4, 4)) + facet_grid(label ~ state)
plot(ratios, stacked=FALSE, which="L", ggplot2=TRUE) + xlab("Years since surgery") +
    ylim(c(0.1, 10)) + coord_trans(y="log10") + facet_grid(label ~ state)
# }