PRISM: PRISM: Patient Response Identifier for Stratified Medicine

Description

PRISM algorithm. Given a data-set of (Y, A, X) (Outcome, treatment, covariates), the PRISM identifies potential subgroup along with point and variability metrics. This four step procedure (filter, ple, submod, param) is flexible and accepts user-inputs at each step.

Usage

PRISM(Y, A, X, Xtest = NULL, family = "gaussian",
  filter = "filter_glmnet", ple = NULL, submod = NULL,
  param = NULL, alpha_ovrl = 0.05, alpha_s = 0.05,
  filter.hyper = NULL, ple.hyper = NULL, submod.hyper = NULL,
  param.hyper = NULL, prefilter_resamp = FALSE, resample = NULL,
  stratify = TRUE, R = 100, filter.resamp = NULL,
  ple.resamp = NULL, submod.resamp = NULL, verbose = TRUE,
  verbose.resamp = FALSE)

Arguments

The outcome variable. Must be numeric or survival (ex; Surv(time,cens) )

Treatment variable. (ex: a=1,...,A or a="control","new")

Covariate space. Variables types (ex: numeric, factor, ordinal) should be set to align with subgroup model (submod argument). For example, for lmtree, binary variables coded as numeric (ex: 0, 1) are treated differently than the corresponding factor version (ex: "A", "B"). Filter and PLE models provided in the StratifiedMedicine package can accomodate all variable types.

Xtest

Test set. Default is NULL which uses X (training set). Variable types should match X.

family

Outcome type. Options include "gaussion" (default), "binomial", and "survival".

filter

Maps (Y,A,X) => (Y,A,X.star) where X.star has potentially less covariates than X. Default is "Filter_ENET", NULL uses no filter.

ple

PLE (Patient-Level Estimate) function. Maps the observed data to PLEs. (Y,A,X) ==> PLE(X). Default for "gaussian"/"binomial" is "ple_ranger" (treatment-specific random forest models). The default for "survival" is "ple_glmnet" (elastic net (glmnet) cox regression).

submod

Subgroup identification model function. Maps the observed data and/or PLEs to subgroups. Default of "gaussian"/"binomial" is "submod_lmtree" (MOB with OLS loss). Default for "survival" is "submod_weibull" (MOB with weibull loss)

param

Parameter estimation and inference function. Based on the discovered subgroups, perform inference through the input function (by name). Default for "gaussian"/"binomial" is "param_PLE", default for "survival" is "param_cox".

alpha_ovrl

Two-sided alpha level for overall population. Default=0.05

alpha_s

Two-sided alpha level at subgroup level. Default=0.05

filter.hyper

Hyper-parameters for the Filter function (must be list). Default is NULL.

ple.hyper

Hyper-parameters for the PLE function (must be list). Default is NULL.

submod.hyper

Hyper-parameters for the SubMod function (must be list). Default is NULL.

param.hyper

Hyper-parameters for the Param function (must be list). Default is NULL.

prefilter_resamp

Option to filter the covariate space (based on filter model) prior to resampling. Default=FALSE.

resample

Resampling method for resample-based estimates and variability metrics. Options include "Boostrap" and "Permutation." Default=NULL (No resampling).

stratify

Stratified resampling (Default=TRUE)

Number of resamples (default=100)

filter.resamp

Filter function during resampling, default=NULL (use original Filter)

ple.resamp

PLE function during resampling, default=NULL (use original PLE)

submod.resamp

SubMod function for resampling, default=NULL (use original SubMod)

verbose

Detail progress of PRISM? Default=TRUE

verbose.resamp

Output iterations during resampling? Default=FALSE

Value

Trained PRISM object. Includes filter, ple, submod, and param outputs.

filter.mod - Filter model
filter.vars - Variables remaining after filtering
ple.fit - Fitted ple model (model fit, other fit outputs)
mu_train - Patient-level estimates (train)
mu_test - Patient-level estimates (test)
submod.fit - Fitted submod model (model fit, other fit outputs)
out.train - Training data-set with identified subgroups
out.test - Test data-set with identified subgroups
Rules - Subgroup rules / definitions
param.dat - Parameter estimates and variablity metrics (depends on param)
resamp.dist - Resampling distributions (NULL if no resampling is done)

References

Jemielita and Mehrotra (2019 in progress)

Examples

Run this code

# NOT RUN {
## Load library ##
library(StratifiedMedicine)

##### Examples: Continuous Outcome ###########

dat_ctns = generate_subgrp_data(family="gaussian")
Y = dat_ctns$Y
X = dat_ctns$X
A = dat_ctns$A

# Run Default: filter_glmnet, ple_ranger, submod_lmtree, param_ple #
res0 = PRISM(Y=Y, A=A, X=X)
res0$filter.vars # variables that pass the filter
plot(res0, type="PLE:density") # distribution of PLEs
plot(res0, type="PLE:waterfall") # PLE waterfall plot
plot(res0$submod.fit$mod) # Plot of subgroup model
res0$param.dat # overall/subgroup specific parameter estimates/inference
plot(res0) # Forest plot: overall/subgroup specific parameter estimates (CIs)

# Without filtering #
res1 = PRISM(Y=Y, A=A, X=X, filter=NULL)
plot(res1$submod.fit$mod)
plot(res1)

## With bootstrap (No filtering) ##
# }
# NOT RUN {
  res_boot = PRISM(Y=Y, A=A, X=X, resample = "Bootstrap", R=50, verbose.resamp = TRUE)
  # Plot of distributions and P(est>0) #
  plot(res_boot, type="resample")+geom_vline(xintercept = 0)
  aggregate(I(est>0)~Subgrps, data=res_boot$resamp.dist, FUN="mean")
# }
# NOT RUN {
# Survival Data ##
# }
# NOT RUN {
  library(survival)
  require(TH.data); require(coin)
  data("GBSG2", package = "TH.data")
  surv.dat = GBSG2
  # Design Matrices ###
  Y = with(surv.dat, Surv(time, cens))
  X = surv.dat[,!(colnames(surv.dat) %in% c("time", "cens")) ]
  set.seed(513)
  A = rbinom( n = dim(X)[1], size=1, prob=0.5  )

  # Default: PRISM: glmnet ==> MOB (Weibull) ==> Cox; bootstrapping posterior prob/inference #
  res_weibull1 = PRISM(Y=Y, A=A, X=X, ple=NULL, resample="Bootstrap", R=100,
                       verbose.resamp = TRUE)
  plot(res_weibull1$submod.fit$mod)
  plot(res_weibull1)
  plot(res_weibull1, type="resample")+geom_vline(xintercept = 1)
  aggregate(I(est<1)~Subgrps, data=res_weibull1$resamp.dist, FUN="mean")

  # PRISM: ENET ==> CTREE ==> Cox; bootstrapping for posterior prob/inference #
  res_ctree1 = PRISM(Y=Y, A=A, X=X, ple=NULL, submod = "submod_ctree",
                     resample="Bootstrap", R=100, verbose.resamp = TRUE)
  plot(res_ctree1$submod.fit$submod.fit$mod)
  plot(res_ctree1)
  plot(res_ctree1, type="resample")+geom_vline(xintercept = 1)
  aggregate(I(est<1)~Subgrps, data=res_ctree1$resamp.dist, FUN="mean")
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab