PRISM: PRISM: Patient Response Identifier for Stratified Medicine

Description

PRISM algorithm. Given a data-set of (Y, A, X) (Outcome, treatment, covariates), the PRISM identifies potential subgroup along with point and variability metrics. This four step procedure (filter, ple, submod, param) is flexible and accepts user-inputs at each step.

Usage

PRISM(Y, A = NULL, X, Xtest = NULL, family = "gaussian",
  filter = "filter_glmnet", ple = NULL, submod = NULL,
  param = NULL, alpha_ovrl = 0.05, alpha_s = 0.05,
  filter.hyper = NULL, ple.hyper = NULL, submod.hyper = NULL,
  param.hyper = NULL, bayes = NULL, prefilter_resamp = FALSE,
  resample = NULL, stratify = TRUE, R = 100, filter.resamp = NULL,
  ple.resamp = NULL, submod.resamp = NULL, verbose = TRUE,
  verbose.resamp = FALSE, seed = 777)

Arguments

The outcome variable. Must be numeric or survival (ex; Surv(time,cens) )

Treatment variable. (ex: a=1,...,A, should be numeric). Default is NULL, which searches for prognostic variables (Y~X).

Covariate space. Variables types (ex: numeric, factor, ordinal) should be set to align with subgroup model (submod argument). For example, for lmtree, binary variables coded as numeric (ex: 0, 1) are treated differently than the corresponding factor version (ex: "A", "B"). Filter and PLE models provided in the StratifiedMedicine package can accomodate all variable types.

Xtest

Test set. Default is NULL which uses X (training set). Variable types should match X.

family

Outcome type. Options include "gaussion" (default), "binomial", and "survival".

filter

Maps (Y,A,X) => (Y,A,X.star) where X.star has potentially less covariates than X. Default is "filter_glmnet", "None" uses no filter.

ple

PLE (Patient-Level Estimate) function. Maps the observed data to PLEs. (Y,A,X) ==> PLE(X). Default for "gaussian"/"binomial" is "ple_ranger" (treatment-specific random forest models). The default for "survival" is "ple_glmnet" (elastic net (glmnet) cox regression). "None" uses no ple.

submod

Subgroup identification model function. Maps the observed data and/or PLEs to subgroups. Default of "gaussian"/"binomial" is "submod_lmtree" (MOB with OLS loss). Default for "survival" is "submod_weibull" (MOB with weibull loss). "None" uses no submod.

param

Parameter estimation and inference function. Based on the discovered subgroups, perform inference through the input function (by name). Default for "gaussian"/"binomial" is "param_PLE", default for "survival" is "param_cox".

alpha_ovrl

Two-sided alpha level for overall population. Default=0.05

alpha_s

Two-sided alpha level at subgroup level. Default=0.05

filter.hyper

Hyper-parameters for the Filter function (must be list). Default is NULL.

ple.hyper

Hyper-parameters for the PLE function (must be list). Default is NULL.

submod.hyper

Hyper-parameters for the SubMod function (must be list). Default is NULL.

param.hyper

Hyper-parameters for the Param function (must be list). Default is NULL.

bayes

Based on input point estimates/SEs, this uses a bayesian based approach to obtain ests, SEs, CIs, and posterior probabilities. Currently includes "norm_norm" (normal prior at overall estimate with large uninformative variance; normal posterior). Default=NULL.

prefilter_resamp

Option to filter the covariate space (based on filter model) prior to resampling. Default=FALSE.

resample

Resampling method for resample-based estimates and variability metrics. Options include "Boostrap", "Permutation", and "CV". Default=NULL (No resampling).

stratify

Stratified resampling (Default=TRUE)

Number of resamples (default=100)

filter.resamp

Filter function during resampling, default=NULL (use filter)

ple.resamp

PLE function during resampling, default=NULL (use ple)

submod.resamp

submod function for resampling, default=NULL (use submod)

verbose

Detail progress of PRISM? Default=TRUE

verbose.resamp

Output iterations during resampling? Default=FALSE

seed

Seed for PRISM run (Default=777)

Value

Trained PRISM object. Includes filter, ple, submod, and param outputs.

filter.mod - Filter model
filter.vars - Variables remaining after filtering
ple.fit - Fitted ple model (model fit, other fit outputs)
mu_train - Patient-level estimates (train)
mu_test - Patient-level estimates (test)
submod.fit - Fitted submod model (model fit, other fit outputs)
out.train - Training data-set with identified subgroups
out.test - Test data-set with identified subgroups
Rules - Subgroup rules / definitions
param.dat - Parameter estimates and variablity metrics (depends on param)
resamp.dist - Resampling distributions (NULL if no resampling is done)
bayes.fun - Function to simulate posterior distribution (NULL if no bayes)

References

Jemielita and Mehrotra (2019 in progress)

Examples

Run this code

# NOT RUN {
## Load library ##
library(StratifiedMedicine)

##### Examples: Continuous Outcome ###########

dat_ctns = generate_subgrp_data(family="gaussian")
Y = dat_ctns$Y
X = dat_ctns$X
A = dat_ctns$A

# Run Default: filter_glmnet, ple_ranger, submod_lmtree, param_ple #
res0 = PRISM(Y=Y, A=A, X=X)
summary(res0)
plot(res0, type="PLE:density") # distribution of PLEs
plot(res0, type="PLE:waterfall") # PLE waterfall plot
plot(res0$submod.fit$mod) # Plot of subgroup model
res0$param.dat # overall/subgroup specific parameter estimates/inference
plot(res0) # Forest plot: overall/subgroup specific parameter estimates (CIs)

# Without filtering #
res1 = PRISM(Y=Y, A=A, X=X, filter="None" )
summary(res1)
plot(res1$submod.fit$mod)
plot(res1)


## With bootstrap (No filtering) ##
# }
# NOT RUN {
  res_boot = PRISM(Y=Y, A=A, X=X, resample = "Bootstrap", R=50, verbose.resamp = TRUE)
  # Plot of distributions and P(est>0) #
  plot(res_boot, type="resample", estimand = "E(Y|A=1)-E(Y|A=0)")+geom_vline(xintercept = 0)
  aggregate(I(est>0)~Subgrps, data=res_boot$resamp.dist, FUN="mean")
# }
# NOT RUN {
# Survival Data ##
# }
# NOT RUN {
  library(survival)
  require(TH.data); require(coin)
  data("GBSG2", package = "TH.data")
  surv.dat = GBSG2
  # Design Matrices ###
  Y = with(surv.dat, Surv(time, cens))
  X = surv.dat[,!(colnames(surv.dat) %in% c("time", "cens")) ]
  set.seed(513)
  A = rbinom( n = dim(X)[1], size=1, prob=0.5  )

  # Default: PRISM: glmnet ==> MOB (Weibull) ==> Cox; bootstrapping posterior prob/inference #
  res_weibull1 = PRISM(Y=Y, A=A, X=X, ple=NULL, resample="Bootstrap", R=100,
                       verbose.resamp = TRUE)
  plot(res_weibull1$submod.fit$mod)
  plot(res_weibull1)
  plot(res_weibull1, type="resample", estimand = "HR(A=1 vs A=0)")+geom_vline(xintercept = 1)
  aggregate(I(est<1)~Subgrps, data=res_weibull1$resamp.dist, FUN="mean")

  # PRISM: ENET ==> CTREE ==> Cox; bootstrapping for posterior prob/inference #
  res_ctree1 = PRISM(Y=Y, A=A, X=X, ple=NULL, submod = "submod_ctree",
                     resample="Bootstrap", R=100, verbose.resamp = TRUE)
  plot(res_ctree1$submod.fit$submod.fit$mod)
  plot(res_ctree1)
  plot(res_ctree1, type="resample", estimand="HR(A=1 vs A=0)")+geom_vline(xintercept = 1)
  aggregate(I(est<1)~Subgrps, data=res_ctree1$resamp.dist, FUN="mean")
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab