PRISM: PRISM: Patient Response Identifier for Stratified Medicine

Description

PRISM algorithm. Given a data-set of (Y, A, X) (Outcome, treatment, covariates), the PRISM identifies potential subgroup along with point and variability metrics. This four step procedure (filter, ple, submod, param) is flexible and accepts user-inputs at each step.

Usage

PRISM(Y, A = NULL, X, Xtest = NULL, family = "gaussian",
  filter = "filter_glmnet", ple = NULL, submod = NULL,
  param = NULL, alpha_ovrl = 0.05, alpha_s = 0.05,
  filter.hyper = NULL, ple.hyper = NULL, submod.hyper = NULL,
  param.hyper = NULL, bayes = NULL, prefilter_resamp = FALSE,
  resample = NULL, stratify = TRUE, R = NULL, calibrate = FALSE,
  alpha.mat = NULL, filter.resamp = NULL, ple.resamp = NULL,
  submod.resamp = NULL, verbose = TRUE, verbose.resamp = FALSE,
  seed = 777)

Arguments

The outcome variable. Must be numeric or survival (ex; Surv(time,cens) )

Treatment variable. (Default support binary treatment, either numeric or factor). If A=NULL, searches for prognostic variables (Y~X).

Covariate space. Variables types (ex: numeric, factor, ordinal) should be set to align with subgroup model (submod argument). For example, for lmtree, binary variables coded as numeric (ex: 0, 1) are treated differently than the corresponding factor version (ex: "A", "B"). Filter and PLE models provided in the StratifiedMedicine package can accomodate all variable types.

Xtest

Test set. Default is NULL which uses X (training set). Variable types should match X.

family

Outcome type. Options include "gaussion" (default), "binomial", and "survival".

filter

Maps (Y,A,X) => (Y,A,X.star) where X.star has potentially less covariates than X. Default is "filter_glmnet", "None" uses no filter.

ple

PLE (Patient-Level Estimate) function. Maps the observed data to PLEs. (Y,A,X) ==> PLE(X). Default for is "ple_ranger". For continuous/binomial outcome data, this fits treatment specific random forest models. For survival outcome data, this fits a single forest, with expanded covariate space (A, X, X*A). (treatment-specific random forest models). "None" uses no ple.

submod

Subgroup identification model function. Maps the observed data and/or PLEs to subgroups. Default of "gaussian"/"binomial" is "submod_lmtree" (MOB with OLS loss). Default for "survival" is "submod_weibull" (MOB with weibull loss). "None" uses no submod.

param

Parameter estimation and inference function. Based on the discovered subgroups, perform inference through the input function (by name). Default for "gaussian"/"binomial" is "param_PLE", default for "survival" is "param_cox".

alpha_ovrl

Two-sided alpha level for overall population. Default=0.05

alpha_s

Two-sided alpha level at subgroup level. Default=0.05

filter.hyper

Hyper-parameters for the Filter function (must be list). Default is NULL.

ple.hyper

Hyper-parameters for the PLE function (must be list). Default is NULL.

submod.hyper

Hyper-parameters for the SubMod function (must be list). Default is NULL.

param.hyper

Hyper-parameters for the Param function (must be list). Default is NULL.

bayes

Based on input point estimates/SEs, this uses a bayesian based approach to obtain ests, SEs, CIs, and posterior probabilities. Currently includes "norm_norm" (normal prior at overall estimate with large uninformative variance; normal posterior). Default=NULL.

prefilter_resamp

Option to filter the covariate space (based on filter model) prior to resampling. Default=FALSE.

resample

Resampling method for resample-based estimates and variability metrics. Options include "Bootstrap", "Permutation", and "CV" (cross-validation). Default=NULL (No resampling).

stratify

Stratified resampling (Default=TRUE)

Number of resamples (default=NULL; R=100 for Permutation/Bootstrap and R=5 for CV)

calibrate

Bootstrap calibration for nominal alpha (Loh et al 2016). Default=FALSE. For TRUE, outputs the calibrated alpha level and calibrated CIs for the overall population and subgroups. Not applicable for permutation/CV resampling.

alpha.mat

Grid of alpha values for calibration. Default=NULL, which uses seq(alpha/1000,alpha,by=0.005) for alpha_ovrl/alpha_s.

filter.resamp

Filter function during resampling, default=NULL (use filter)

ple.resamp

PLE function during resampling, default=NULL (use ple)

submod.resamp

submod function for resampling, default=NULL (use submod)

verbose

Detail progress of PRISM? Default=TRUE

verbose.resamp

Output iterations during resampling? Default=FALSE

seed

Seed for PRISM run (Default=777)

Value

Trained PRISM object. Includes filter, ple, submod, and param outputs.

filter.mod - Filter model
filter.vars - Variables remaining after filtering
ple.fit - Fitted ple model (model fit, other fit outputs)
mu_train - Patient-level estimates (train)
mu_test - Patient-level estimates (test)
submod.fit - Fitted submod model (model fit, other fit outputs)
out.train - Training data-set with identified subgroups
out.test - Test data-set with identified subgroups
Rules - Subgroup rules / definitions
param.dat - Parameter estimates and variablity metrics (depends on param)
resamp.dist - Resampling distributions (NULL if no resampling is done)
bayes.fun - Function to simulate posterior distribution (NULL if no bayes)

Details

PRISM is a general framework with five key steps:

0. Estimand: Determine the question of interest (ex: mean treatment difference)

1. Filter: Reduce covariate space by removing noise covariates. Options include elastic net (filter_glmnet) and random forest variable importance (filter_ranger).

2. Patient-Level Estimates (ple): Estimate counterfactual patient-level quantities, for example, the individual treatment effect, E(Y|A=1)-E(Y|A=0). Options include: treatment-specific or virtual twins (Y~A+X+A*X) through random forest (ple_ranger, ple_rfsrc), elastic net (ple_glmnet), BART (ple_bart) and causal forest (ple_causal_forest).

3. Subgroup Model (submod): Partition the data into subsets or subgroups of patients. Options include: conditional inference trees (observed outcome or individual treatment effect/PLE; submod_ctree), MOB GLM (submod_glmtree), MOB OLS (submod_lmtree), optimal treatment regimes (submod_otr), rpart (submod_rpart), and MOB Weibull (submod_weibull).

4. Parameter Estimation (param): For the overall population and the discovered subgroups (if any), obtain point-estimates and variability metrics. Options include: cox regression (param_cox), double robust estimator (param_dr), linear regression (param_lm), average of patient-level estimates (param_ple), and restricted mean survival time (param_rmst).

Steps 1-4 also support user-specific models. If treatment is provided (A!=NULL), the default settings are as follows:

Y is continuous (family="gaussian"): Elastic Net Filter ==> Treatment-Specific random forest models ==> MOB (OLS) ==> Average of patient-level estimates (param_ple)

Y is binary (family="binomial"): Elastic Net Filter ==> Treatment-Specific random forest models ==> MOB (GLM) ==> Average of patient-level estimates (param_ple)

Y is right-censored (family="survival"): Elastic Net Filter ==> Virtual twin survival random forest models ==> MOB (Weibull) ==> Cox regression (param_cox)

If treatment is not provided (A=NULL), the default settings are as follows:

Y is continuous (family="gaussian"): Elastic Net Filter ==> Random Forest ==> ctree ==> linear regression

Y is binary (family="binomial"): Elastic Net Filter ==> Random Forest ==> ctree ==> linear regression

Y is right-censored (family="survival"): Elastic Net Filter ==> Survival Random Forest ==> ctree ==> RMST

References

Jemielita T, Mehrotra D. PRISM: Patient Response Identifiers for Stratified Medicine. https://arxiv.org/abs/1912.03337

Examples

Run this code

# NOT RUN {
## Load library ##
library(StratifiedMedicine)

## Examples: Continuous Outcome ##

dat_ctns = generate_subgrp_data(family="gaussian")
Y = dat_ctns$Y
X = dat_ctns$X
A = dat_ctns$A

# Run Default: filter_glmnet, ple_ranger, submod_lmtree, param_ple #
res0 = PRISM(Y=Y, A=A, X=X)
# }
# NOT RUN {
summary(res0)
plot(res0)
# }
# NOT RUN {
# Without filtering #
# }
# NOT RUN {
res1 = PRISM(Y=Y, A=A, X=X, filter="None" )
summary(res1)
plot(res1)
# }
# NOT RUN {
# Search for Prognostic Only (omit A from function) #
# }
# NOT RUN {
res3 = PRISM(Y=Y, X=X)
summary(res3)
plot(res3)
# }
# NOT RUN {
## With bootstrap (No filtering) ##
# }
# NOT RUN {
  res_boot = PRISM(Y=Y, A=A, X=X, resample = "Bootstrap", R=50, verbose.resamp = TRUE)
  # Plot of distributions and P(est>0) #
  plot(res_boot, type="resample", estimand = "E(Y|A=1)-E(Y|A=0)")+geom_vline(xintercept = 0)
  aggregate(I(est>0)~Subgrps, data=res_boot$resamp.dist, FUN="mean")
# }
# NOT RUN {
## Examples: Binary Outcome ##
# }
# NOT RUN {
dat_ctns = generate_subgrp_data(family="binomial")
Y = dat_ctns$Y
X = dat_ctns$X
A = dat_ctns$A

# Run Default: filter_glmnet, ple_ranger, submod_glmtree, param_ple #
res0 = PRISM(Y=Y, A=A, X=X)

plot(res0)
# }
# NOT RUN {
# Survival Data ##
# }
# NOT RUN {
  library(survival)
  require(TH.data); require(coin)
  data("GBSG2", package = "TH.data")
  surv.dat = GBSG2
  # Design Matrices ###
  Y = with(surv.dat, Surv(time, cens))
  X = surv.dat[,!(colnames(surv.dat) %in% c("time", "cens")) ]
  set.seed(513)
  A = rbinom( n = dim(X)[1], size=1, prob=0.5  )

  # PRISM: glmnet ==> MOB (Weibull) ==> Cox; with bootstrap #
  res_weibull1 = PRISM(Y=Y, A=A, X=X, ple="None", resample="Bootstrap", R=100,
                       verbose.resamp = TRUE)
  plot(res_weibull1)
  plot(res_weibull1, type="resample", estimand = "HR(A=1 vs A=0)")+geom_vline(xintercept = 1)
  aggregate(I(est<1)~Subgrps, data=res_weibull1$resamp.dist, FUN="mean")

  # PRISM: ENET ==> CTREE ==> Cox; with bootstrap #
  res_ctree1 = PRISM(Y=Y, A=A, X=X, ple=NULL, submod = "submod_ctree",
                     resample="Bootstrap", R=100, verbose.resamp = TRUE)
  plot(res_ctree1)
  plot(res_ctree1, type="resample", estimand="HR(A=1 vs A=0)")+geom_vline(xintercept = 1)
  aggregate(I(est<1)~Subgrps, data=res_ctree1$resamp.dist, FUN="mean")
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab