predict.stan_nma: Predictions of absolute effects from NMA models

Description

Obtain predictions of absolute effects from NMA models fitted with nma(). For example, if a model is fitted to binary data with a logit link, predicted outcome probabilities or log odds can be produced. For survival models, predictions can be made for survival probabilities, (cumulative) hazards, (restricted) mean survival times, and quantiles including median survival times. When an IPD NMA or ML-NMR model has been fitted, predictions can be produced either at an individual level or at an aggregate level. Aggregate-level predictions are population-average absolute effects; these are marginalised or standardised over a population. For example, average event probabilities from a logistic regression, or marginal (standardised) survival probabilities from a survival model.

Usage

# S3 method for stan_nma
predict(
  object,
  ...,
  baseline = NULL,
  newdata = NULL,
  study = NULL,
  trt_ref = NULL,
  type = c("link", "response"),
  level = c("aggregate", "individual"),
  baseline_type = c("link", "response"),
  baseline_level = c("individual", "aggregate"),
  probs = c(0.025, 0.25, 0.5, 0.75, 0.975),
  predictive_distribution = FALSE,
  summary = TRUE
)
# S3 method for stan_nma_surv
predict(
  object,
  times = NULL,
  ...,
  baseline = NULL,
  aux = NULL,
  newdata = NULL,
  study = NULL,
  trt_ref = NULL,
  type = c("survival", "hazard", "cumhaz", "mean", "median", "quantile", "rmst", "link"),
  quantiles = c(0.25, 0.5, 0.75),
  level = c("aggregate", "individual"),
  times_seq = NULL,
  probs = c(0.025, 0.25, 0.5, 0.75, 0.975),
  predictive_distribution = FALSE,
  summary = TRUE
)

Value

A nma_summary object if summary = TRUE, otherwise a list containing a 3D MCMC array of samples and (for regression models) a data frame of study information.

Arguments

object

A stan_nma object created by nma().

...

Additional arguments, passed to uniroot() for regression models if baseline_level = "aggregate".

baseline

An optional distr() distribution for the baseline response (i.e. intercept), about which to produce absolute effects. Can also be a character string naming a study in the network to take the estimated baseline response distribution from. If NULL, predictions are produced using the baseline response for each study in the network with IPD or arm-based AgD.

For regression models, this may be a list of distr() distributions (or study names in the network to use the baseline distributions from) of the same length as the number of studies in newdata, possibly named by the studies in newdata or otherwise in order of appearance in newdata.

Use the baseline_type and baseline_level arguments to specify whether this distribution is on the response or linear predictor scale, and (for ML-NMR or models including IPD) whether this applies to an individual at the reference level of the covariates or over the entire newdata population, respectively. For example, in a model with a logit link with baseline_type = "link", this would be a distribution for the baseline log odds of an event. For survival models, baseline always corresponds to the intercept parameters in the linear predictor (i.e. baseline_type is always "link", and baseline_level is "individual" for IPD NMA or ML-NMR, and "aggregate" for AgD NMA).

Use the trt_ref argument to specify which treatment this distribution applies to.

newdata

Only required if a regression model is fitted and baseline is specified. A data frame of covariate details, for which to produce predictions. Column names must match variables in the regression model.

If level = "aggregate" this should either be a data frame with integration points as produced by add_integration() (one row per study), or a data frame with individual covariate values (one row per individual) which are summarised over.

If level = "individual" this should be a data frame of individual covariate values, one row per individual.

If NULL, predictions are produced for all studies with IPD and/or arm-based AgD in the network, depending on the value of level.

study

Column of newdata which specifies study names or IDs. When not specified: if newdata contains integration points produced by add_integration(), studies will be labelled sequentially by row; otherwise data will be assumed to come from a single study.

trt_ref

Treatment to which the baseline response distribution refers, if baseline is specified. By default, the baseline response distribution will refer to the network reference treatment. Coerced to character string.

type

Whether to produce predictions on the "link" scale (the default, e.g. log odds) or "response" scale (e.g. probabilities).

For survival models, the options are "survival" for survival probabilities (the default), "hazard" for hazards, "cumhaz" for cumulative hazards, "mean" for mean survival times, "quantile" for quantiles of the survival time distribution, "median" for median survival times (equivalent to type = "quantile" with quantiles = 0.5), "rmst" for restricted mean survival times, or "link" for the linear predictor. For type = "survival", "hazard" or "cumhaz", predictions are given at the times specified by times or at the event/censoring times in the network if times = NULL. For type = "rmst", the restricted time horizon is specified by times, or if times = NULL the earliest last follow-up time amongst the studies in the network is used. When level = "aggregate", these all correspond to the standardised survival function (see details).

level

The level at which predictions are produced, either "aggregate" (the default), or "individual". If baseline is not specified, predictions are produced for all IPD studies in the network if level is "individual" or "aggregate", and for all arm-based AgD studies in the network if level is "aggregate".

baseline_type

When a baseline distribution is given, specifies whether this corresponds to the "link" scale (the default, e.g. log odds) or "response" scale (e.g. probabilities). For survival models, baseline always corresponds to the intercept parameters in the linear predictor (i.e. baseline_type is always "link").

baseline_level

When a baseline distribution is given, specifies whether this corresponds to an individual at the reference level of the covariates ("individual", the default), or from an (unadjusted) average outcome on the reference treatment in the newdata population ("aggregate"). Ignored for AgD NMA, since the only option is "aggregate" in this instance. For survival models, baseline always corresponds to the intercept parameters in the linear predictor (i.e. baseline_level is "individual" for IPD NMA or ML-NMR, and "aggregate" for AgD NMA).

probs

Numeric vector of quantiles of interest to present in computed summary, default c(0.025, 0.25, 0.5, 0.75, 0.975)

predictive_distribution

Logical, when a random effects model has been fitted, should the predictive distribution for absolute effects in a new study be returned? Default FALSE.

summary

Logical, calculate posterior summaries? Default TRUE.

times

A numeric vector of times to evaluate predictions at. Alternatively, if newdata is specified, times can be the name of a column in newdata which contains the times. If NULL (the default) then predictions are made at the event/censoring times from the studies included in the network (or according to times_seq). Only used if type is "survival", "hazard", "cumhaz" or "rmst".

aux

An optional distr() distribution for the auxiliary parameter(s) in the baseline hazard (e.g. shapes). Can also be a character string naming a study in the network to take the estimated auxiliary parameter distribution from. If NULL, predictions are produced using the parameter estimates for each study in the network with IPD or arm-based AgD.

For regression models, this may be a list of distr() distributions (or study names in the network to use the auxiliary parameters from) of the same length as the number of studies in newdata, possibly named by the study names or otherwise in order of appearance in newdata.

quantiles

A numeric vector of quantiles of the survival time distribution to produce estimates for when type = "quantile".

times_seq

A positive integer, when specified evaluate predictions at this many evenly-spaced event times between 0 and the latest follow-up time in each study, instead of every observed event/censoring time. Only used if newdata = NULL and type is "survival", "hazard" or "cumhaz". This can be useful for plotting survival or (cumulative) hazard curves, where prediction at every observed even/censoring time is unnecessary and can be slow. When a call from within plot() is detected, e.g. like plot(predict(fit, type = "survival")), times_seq will default to 50.

Aggregate-level predictions from IPD NMA and ML-NMR models

Population-average absolute effects can be produced from IPD NMA and ML-NMR models with level = "aggregate". Predictions are averaged over the target population (i.e. standardised/marginalised), either by (numerical) integration over the joint covariate distribution (for AgD studies in the network for ML-NMR, or AgD newdata with integration points created by add_integration()), or by averaging predictions for a sample of individuals (for IPD studies in the network for IPD NMA/ML-NMR, or IPD newdata).

For example, with a binary outcome, the population-average event probabilities on treatment $k$ in study/population $j$ are $$\bar{p}_{jk} = \int_\mathfrak{X} p_{jk}(\mathbf{x}) f_{jk}(\mathbf{x}) d\mathbf{x}$$ for a joint covariate distribution $f_{jk}(\mathbf{x})$ with support $\mathfrak{X}$ or $$\bar{p}_{jk} = \sum_i p_{jk}(\mathbf{x}_i)$$ for a sample of individuals with covariates $\mathbf{x}_i$.

Population-average absolute predictions follow similarly for other types of outcomes, however for survival outcomes there are specific considerations.

Standardised survival predictions

Different types of population-average survival predictions, often called standardised survival predictions, follow from the standardised survival function created by integrating (or equivalently averaging) the individual-level survival functions at each time $t$: $$\bar{S}_{jk}(t) = \int_\mathfrak{X} S_{jk}(t | \mathbf{x}) f_{jk}(\mathbf{x}) d\mathbf{x}$$ which is itself produced using type = "survival".

The standardised hazard function corresponding to this standardised survival function is a weighted average of the individual-level hazard functions $$\bar{h}_{jk}(t) = \frac{\int_\mathfrak{X} S_{jk}(t | \mathbf{x}) h_{jk}(t | \mathbf{x}) f_{jk}(\mathbf{x}) d\mathbf{x} }{\bar{S}_{jk}(t)}$$ weighted by the probability of surviving to time $t$. This is produced using type = "hazard".

The corresponding standardised cumulative hazard function is $$\bar{H}_{jk}(t) = -\log(\bar{S}_{jk}(t))$$ and is produced using type = "cumhaz".

Quantiles and medians of the standardised survival times are found by solving $$\bar{S}_{jk}(t) = 1-\alpha$$ for the $\alpha\%$ quantile, using numerical root finding. These are produced using type = "quantile" or "median".

(Restricted) means of the standardised survival times are found by integrating $$\mathrm{RMST}_{jk}(t^*) = \int_0^{t^*} \bar{S}_{jk}(t) dt$$ up to the restricted time horizon $t^*$, with $t^*=\infty$ for mean standardised survival time. These are produced using type = "rmst" or "mean".

Examples

Run this code

## Smoking cessation
# \donttest{
# Run smoking RE NMA example if not already available
if (!exists("smk_fit_RE")) example("example_smk_re", run.donttest = TRUE)
# }
# \donttest{
# Predicted log odds of success in each study in the network
predict(smk_fit_RE)

# Predicted probabilities of success in each study in the network
predict(smk_fit_RE, type = "response")

# Predicted probabilities in a population with 67 observed events out of 566
# individuals on No Intervention, corresponding to a Beta(67, 566 - 67)
# distribution on the baseline probability of response, using
# `baseline_type = "response"`
(smk_pred_RE <- predict(smk_fit_RE,
                        baseline = distr(qbeta, 67, 566 - 67),
                        baseline_type = "response",
                        type = "response"))
plot(smk_pred_RE, ref_line = c(0, 1))

# Predicted probabilities in a population with a baseline log odds of
# response on No Intervention given a Normal distribution with mean -2
# and SD 0.13, using `baseline_type = "link"` (the default)
# Note: this is approximately equivalent to the above Beta distribution on
# the baseline probability
(smk_pred_RE2 <- predict(smk_fit_RE,
                         baseline = distr(qnorm, mean = -2, sd = 0.13),
                         type = "response"))
plot(smk_pred_RE2, ref_line = c(0, 1))
# }

## Plaque psoriasis ML-NMR
# \donttest{
# Run plaque psoriasis ML-NMR example if not already available
if (!exists("pso_fit")) example("example_pso_mlnmr", run.donttest = TRUE)
# }
# \donttest{
# Predicted probabilities of response in each study in the network
(pso_pred <- predict(pso_fit, type = "response"))
plot(pso_pred, ref_line = c(0, 1))

# Predicted probabilites of response in a new target population, with means
# and SDs or proportions given by
new_agd_int <- data.frame(
  bsa_mean = 0.6,
  bsa_sd = 0.3,
  prevsys = 0.1,
  psa = 0.2,
  weight_mean = 10,
  weight_sd = 1,
  durnpso_mean = 3,
  durnpso_sd = 1
)

# We need to add integration points to this data frame of new data
# We use the weighted mean correlation matrix computed from the IPD studies
new_agd_int <- add_integration(new_agd_int,
                               durnpso = distr(qgamma, mean = durnpso_mean, sd = durnpso_sd),
                               prevsys = distr(qbern, prob = prevsys),
                               bsa = distr(qlogitnorm, mean = bsa_mean, sd = bsa_sd),
                               weight = distr(qgamma, mean = weight_mean, sd = weight_sd),
                               psa = distr(qbern, prob = psa),
                               cor = pso_net$int_cor,
                               n_int = 64)

# Predicted probabilities of achieving PASI 75 in this target population, given
# a Normal(-1.75, 0.08^2) distribution on the baseline probit-probability of
# response on Placebo (at the reference levels of the covariates), are given by
(pso_pred_new <- predict(pso_fit,
                         type = "response",
                         newdata = new_agd_int,
                         baseline = distr(qnorm, -1.75, 0.08)))
plot(pso_pred_new, ref_line = c(0, 1))
# }

## Progression free survival with newly-diagnosed multiple myeloma
# \donttest{
# Run newly-diagnosed multiple myeloma example if not already available
if (!exists("ndmm_fit")) example("example_ndmm", run.donttest = TRUE)
# }
# \donttest{
# We can produce a range of predictions from models with survival outcomes,
# chosen with the type argument to predict

# Predicted survival probabilities at 5 years
predict(ndmm_fit, type = "survival", times = 5)

# Survival curves
plot(predict(ndmm_fit, type = "survival"))

# Hazard curves
# Here we specify a vector of times to avoid attempting to plot infinite
# hazards for some studies at t=0
plot(predict(ndmm_fit, type = "hazard", times = seq(0.001, 6, length.out = 50)))

# Cumulative hazard curves
plot(predict(ndmm_fit, type = "cumhaz"))

# Survival time quantiles and median survival
predict(ndmm_fit, type = "quantile", quantiles = c(0.2, 0.5, 0.8))
plot(predict(ndmm_fit, type = "median"))

# Mean survival time
predict(ndmm_fit, type = "mean")

# Restricted mean survival time
# By default, the time horizon is the shortest follow-up time in the network
predict(ndmm_fit, type = "rmst")

# An alternative restriction time can be set using the times argument
predict(ndmm_fit, type = "rmst", times = 5)  # 5-year RMST
# }

Run the code above in your browser using DataLab