PRISM: PRISM: Patient Response Identifier for Stratified Medicine

Description

PRISM algorithm. Given a data-set of (Y, A, X) (Outcome, treatment, covariates), the PRISM identifies potential subgroups along with point-estimate and variability metrics; with and without resampling (bootstrap or cross-validation based). This four step procedure (filter, ple, submod, param) is flexible and accepts user-inputs at each step.

Usage

PRISM(Y, A = NULL, X, Xtest = NULL, family = "gaussian",
  filter = "filter_glmnet", ple = NULL, submod = NULL,
  param = NULL, alpha_ovrl = 0.05, alpha_s = 0.05,
  filter.hyper = NULL, ple.hyper = NULL, submod.hyper = NULL,
  param.hyper = NULL, bayes = NULL, prefilter_resamp = FALSE,
  resample = NULL, stratify = TRUE, R = NULL, calibrate = FALSE,
  alpha.mat = NULL, filter.resamp = NULL, ple.resamp = NULL,
  submod.resamp = NULL, verbose = TRUE, verbose.resamp = FALSE,
  seed = 777)

Arguments

The outcome variable. Must be numeric or survival (ex; Surv(time,cens) )

Treatment variable. (Defaults support binary treatment, either numeric or factor). If A=NULL, searches for prognostic variables (Y~X).

Covariate space. Variables types (ex: numeric, factor, ordinal) should be set to align with subgroup model (submod argument). For example, for lmtree, binary variables coded as numeric (ex: 0, 1) are treated differently than the corresponding factor version (ex: "A", "B"). Filter and PLE models provided in the StratifiedMedicine package can accomodate all variable types.

Xtest

Test set. Default is NULL which uses X (training set). Variable types should match X.

family

Outcome type. Options include "gaussion" (default), "binomial", and "survival".

filter

Maps (Y,A,X) => (Y,A,X.star) where X.star has potentially less covariates than X. Default is "filter_glmnet", "None" uses no filter.

ple

PLE (Patient-Level Estimate) function. Maps the observed data to PLEs. (Y,A,X) ==> PLE(X). Default for is "ple_ranger". For continuous/binomial outcome data, this fits treatment specific random forest models. For survival outcome data, this fits a single forest, with expanded covariate space (A, X, X*A). (treatment-specific random forest models). "None" uses no ple.

submod

Subgroup identification model function. Maps the observed data and/or PLEs to subgroups. Default of "gaussian"/"binomial" is "submod_lmtree" (MOB with OLS loss). Default for "survival" is "submod_weibull" (MOB with weibull loss). "None" uses no submod.

param

Parameter estimation and inference function. Based on the discovered subgroups, perform inference through the input function (by name). Default for "gaussian"/"binomial" is "param_PLE", default for "survival" is "param_cox".

alpha_ovrl

Two-sided alpha level for overall population. Default=0.05

alpha_s

Two-sided alpha level at subgroup level. Default=0.05

filter.hyper

Hyper-parameters for the Filter function (must be list). Default is NULL.

ple.hyper

Hyper-parameters for the PLE function (must be list). Default is NULL.

submod.hyper

Hyper-parameters for the SubMod function (must be list). Default is NULL.

param.hyper

Hyper-parameters for the Param function (must be list). Default is NULL.

bayes

Based on input point estimates/SEs, this uses a bayesian based approach to obtain ests, SEs, CIs, and posterior probabilities. Currently includes "norm_norm" (normal prior at overall estimate with large uninformative variance; normal posterior). Default=NULL.

prefilter_resamp

Option to filter the covariate space (based on filter model) prior to resampling. Default=FALSE.

resample

Resampling method for resample-based estimates and variability metrics. Options include "Bootstrap", "Permutation", and "CV" (cross-validation). Default=NULL (No resampling).

stratify

Stratified resampling (Default=TRUE)

Number of resamples (default=NULL; R=100 for Permutation/Bootstrap and R=5 for CV)

calibrate

Bootstrap calibration for nominal alpha (Loh et al 2016). Default=FALSE. For TRUE, outputs the calibrated alpha level and calibrated CIs for the overall population and subgroups. Not applicable for permutation/CV resampling.

alpha.mat

Grid of alpha values for calibration. Default=NULL, which uses seq(alpha/1000,alpha,by=0.005) for alpha_ovrl/alpha_s.

filter.resamp

Filter function during resampling, default=NULL (use filter)

ple.resamp

PLE function during resampling, default=NULL (use ple)

submod.resamp

submod function for resampling, default=NULL (use submod)

verbose

Detail progress of PRISM? Default=TRUE

verbose.resamp

Output iterations during resampling? Default=FALSE

seed

Seed for PRISM run (Default=777)

Value

Trained PRISM object. Includes filter, ple, submod, and param outputs.

filter.mod - Filter model
filter.vars - Variables remaining after filtering
ple.fit - Fitted ple model (model fit, other fit outputs)
mu_train - Patient-level estimates (train)
mu_test - Patient-level estimates (test)
submod.fit - Fitted submod model (model fit, other fit outputs)
out.train - Training data-set with identified subgroups
out.test - Test data-set with identified subgroups
Rules - Subgroup rules / definitions
param.dat - Parameter estimates and variablity metrics (depends on param)
resamp.dist - Resampling distributions (NULL if no resampling is done)
bayes.fun - Function to simulate posterior distribution (NULL if no bayes)

Details

PRISM is a general framework with five key steps:

0. Estimand: Determine the question of interest (ex: mean treatment difference)

1. Filter: Reduce covariate space by removing noise covariates. Options include elastic net (filter_glmnet) and random forest variable importance (filter_ranger).

2. Patient-Level Estimates (ple): Estimate counterfactual patient-level quantities, for example, the individual treatment effect, E(Y|A=1)-E(Y|A=0). Options include: treatment-specific or virtual twins (Y~A+X+A*X) through random forest (ple_ranger, ple_rfsrc), elastic net (ple_glmnet), BART (ple_bart) and causal forest (ple_causal_forest).

3. Subgroup Model (submod): Partition the data into subsets or subgroups of patients. Options include: conditional inference trees (observed outcome or individual treatment effect/PLE; submod_ctree), MOB GLM (submod_glmtree), MOB OLS (submod_lmtree), optimal treatment regimes (submod_otr), rpart (submod_rpart), and MOB Weibull (submod_weibull).

4. Parameter Estimation (param): For the overall population and the discovered subgroups (if any), obtain point-estimates and variability metrics. Options include: cox regression (param_cox), double robust estimator (param_dr), linear regression (param_lm), average of patient-level estimates (param_ple), and restricted mean survival time (param_rmst).

Steps 1-4 also support user-specific models. If treatment is provided (A!=NULL), the default settings are as follows:

Y is continuous (family="gaussian"): Elastic Net Filter ==> Treatment-Specific random forest models ==> MOB (OLS) ==> Average of patient-level estimates (param_ple)

Y is binary (family="binomial"): Elastic Net Filter ==> Treatment-Specific random forest models ==> MOB (GLM) ==> Average of patient-level estimates (param_ple)

Y is right-censored (family="survival"): Elastic Net Filter ==> Virtual twin survival random forest models ==> MOB (Weibull) ==> Cox regression (param_cox)

If treatment is not provided (A=NULL), the default settings are as follows:

Y is continuous (family="gaussian"): Elastic Net Filter ==> Random Forest ==> ctree ==> linear regression

Y is binary (family="binomial"): Elastic Net Filter ==> Random Forest ==> ctree ==> linear regression

Y is right-censored (family="survival"): Elastic Net Filter ==> Survival Random Forest ==> ctree ==> RMST

References

Friedman, J., Hastie, T. and Tibshirani, R. (2008) Regularization Paths for Generalized Linear Models via Coordinate Descent, https://web.stanford.edu/~hastie/Papers/glmnet.pdf Journal of Statistical Software, Vol. 33(1), 1-22 Feb 2010 Vol. 33(1), 1-22 Feb 2010.

Jemielita T, Mehrotra D. PRISM: Patient Response Identifiers for Stratified Medicine. https://arxiv.org/abs/1912.03337

Hothorn T, Hornik K, Zeileis A (2006). Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics, 15(3), 651<U+2013>674.

Wright, M. N. & Ziegler, A. (2017). ranger: A fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 77:1-17. https://doi.org/10.18637/jss.v077.i01.

Zeileis A, Hothorn T, Hornik K (2008). Model-Based Recursive Partitioning. Journal of Computational and Graphical Statistics, 17(2), 492<U+2013>514.

Examples

Run this code

# NOT RUN {
## Load library ##
library(StratifiedMedicine)

## Examples: Continuous Outcome ##

dat_ctns = generate_subgrp_data(family="gaussian")
Y = dat_ctns$Y
X = dat_ctns$X
A = dat_ctns$A

# Run Default: filter_glmnet, ple_ranger, submod_lmtree, param_ple #
res0 = PRISM(Y=Y, A=A, X=X)
# }
# NOT RUN {
summary(res0)
plot(res0)
# }
# NOT RUN {
# Without filtering #
# }
# NOT RUN {
res1 = PRISM(Y=Y, A=A, X=X, filter="None" )
summary(res1)
plot(res1)
# }
# NOT RUN {
# Search for Prognostic Only (omit A from function) #
# }
# NOT RUN {
res3 = PRISM(Y=Y, X=X)
summary(res3)
plot(res3)
# }
# NOT RUN {
## With bootstrap (No filtering) ##
# }
# NOT RUN {
library(ggplot2)
  res_boot = PRISM(Y=Y, A=A, X=X, resample = "Bootstrap", R=50, verbose.resamp = TRUE)
  # Plot of distributions and P(est>0) #
  plot(res_boot, type="resample", estimand = "E(Y|A=1)-E(Y|A=0)")+geom_vline(xintercept = 0)
  aggregate(I(est>0)~Subgrps, data=res_boot$resamp.dist, FUN="mean")
# }
# NOT RUN {
## Examples: Binary Outcome ##
# }
# NOT RUN {
dat_ctns = generate_subgrp_data(family="binomial")
Y = dat_ctns$Y
X = dat_ctns$X
A = dat_ctns$A

# Run Default: filter_glmnet, ple_ranger, submod_glmtree, param_ple #
res0 = PRISM(Y=Y, A=A, X=X)

plot(res0)
# }
# NOT RUN {
# Survival Data ##
# }
# NOT RUN {
  library(survival)
  library(ggplot2)
  require(TH.data); require(coin)
  data("GBSG2", package = "TH.data")
  surv.dat = GBSG2
  # Design Matrices ###
  Y = with(surv.dat, Surv(time, cens))
  X = surv.dat[,!(colnames(surv.dat) %in% c("time", "cens")) ]
  set.seed(513)
  A = rbinom( n = dim(X)[1], size=1, prob=0.5  )

  # PRISM: glmnet ==> Random Forest to estimate Treatment-Specific RMST
  # ==> MOB (Weibull) ==> Cox for HRs#
  res_weib = PRISM(Y=Y, A=A, X=X)
  plot(res_weib, type="PLE:waterfall")
  plot(res_weib)
  
  # PRISM: glmnet ==> Random Forest to estimate Treatment-Specific RMST
  # ==> OTR (CTREE, uses RMST estimates as input) ==> Cox for HRs #
  res_otr = PRISM(Y=Y, A=A, X=X)
  plot(res_otr)

  # PRISM: ENET ==> CTREE ==> Cox; with bootstrap #
  res_ctree1 = PRISM(Y=Y, A=A, X=X, ple="None", submod = "submod_ctree",
                     resample="Bootstrap", R=50, verbose.resamp = TRUE)
  plot(res_ctree1)
  plot(res_ctree1, type="resample", estimand="HR(A=1 vs A=0)")+geom_vline(xintercept = 1)
  aggregate(I(est<1)~Subgrps, data=res_ctree1$resamp.dist, FUN="mean")
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab