PRISM algorithm. Given a data-set of (Y, A, X) (Outcome, treatment, covariates),
the PRISM
identifies potential subgroup along with point and variability metrics.
This four step procedure (filter, ple, submod, param) is flexible and accepts user-inputs
at each step.
PRISM(Y, A, X, Xtest = NULL, family = "gaussian",
filter = "filter_glmnet", ple = NULL, submod = NULL,
param = NULL, alpha_ovrl = 0.05, alpha_s = 0.05,
filter.hyper = NULL, ple.hyper = NULL, submod.hyper = NULL,
param.hyper = NULL, prefilter_resamp = FALSE, resample = NULL,
stratify = TRUE, R = 100, filter.resamp = NULL,
ple.resamp = NULL, submod.resamp = NULL, verbose = TRUE,
verbose.resamp = FALSE)
The outcome variable. Must be numeric or survival (ex; Surv(time,cens) )
Treatment variable. (ex: a=1,...,A or a="control","new")
Covariate space. Variables types (ex: numeric, factor, ordinal) should be set to align with subgroup model (submod argument). For example, for lmtree, binary variables coded as numeric (ex: 0, 1) are treated differently than the corresponding factor version (ex: "A", "B"). Filter and PLE models provided in the StratifiedMedicine package can accomodate all variable types.
Test set. Default is NULL which uses X (training set). Variable types should match X.
Outcome type. Options include "gaussion" (default), "binomial", and "survival".
Maps (Y,A,X) => (Y,A,X.star) where X.star has potentially less covariates than X. Default is "Filter_ENET", NULL uses no filter.
PLE (Patient-Level Estimate) function. Maps the observed data to PLEs. (Y,A,X) ==> PLE(X). Default for "gaussian"/"binomial" is "ple_ranger" (treatment-specific random forest models). The default for "survival" is "ple_glmnet" (elastic net (glmnet) cox regression).
Subgroup identification model function. Maps the observed data and/or PLEs to subgroups. Default of "gaussian"/"binomial" is "submod_lmtree" (MOB with OLS loss). Default for "survival" is "submod_weibull" (MOB with weibull loss)
Parameter estimation and inference function. Based on the discovered subgroups, perform inference through the input function (by name). Default for "gaussian"/"binomial" is "param_PLE", default for "survival" is "param_cox".
Two-sided alpha level for overall population. Default=0.05
Two-sided alpha level at subgroup level. Default=0.05
Hyper-parameters for the Filter function (must be list). Default is NULL.
Hyper-parameters for the PLE function (must be list). Default is NULL.
Hyper-parameters for the SubMod function (must be list). Default is NULL.
Hyper-parameters for the Param function (must be list). Default is NULL.
Option to filter the covariate space (based on filter model) prior to resampling. Default=FALSE.
Resampling method for resample-based estimates and variability metrics. Options include "Boostrap" and "Permutation." Default=NULL (No resampling).
Stratified resampling (Default=TRUE)
Number of resamples (default=100)
Filter function during resampling, default=NULL (use original Filter)
PLE function during resampling, default=NULL (use original PLE)
SubMod function for resampling, default=NULL (use original SubMod)
Detail progress of PRISM? Default=TRUE
Output iterations during resampling? Default=FALSE
Trained PRISM object. Includes filter, ple, submod, and param outputs.
filter.mod - Filter model
filter.vars - Variables remaining after filtering
ple.fit - Fitted ple model (model fit, other fit outputs)
mu_train - Patient-level estimates (train)
mu_test - Patient-level estimates (test)
submod.fit - Fitted submod model (model fit, other fit outputs)
out.train - Training data-set with identified subgroups
out.test - Test data-set with identified subgroups
Rules - Subgroup rules / definitions
param.dat - Parameter estimates and variablity metrics (depends on param)
resamp.dist - Resampling distributions (NULL if no resampling is done)
Jemielita and Mehrotra (2019 in progress)
# NOT RUN {
## Load library ##
library(StratifiedMedicine)
##### Examples: Continuous Outcome ###########
dat_ctns = generate_subgrp_data(family="gaussian")
Y = dat_ctns$Y
X = dat_ctns$X
A = dat_ctns$A
# Run Default: filter_glmnet, ple_ranger, submod_lmtree, param_ple #
res0 = PRISM(Y=Y, A=A, X=X)
res0$filter.vars # variables that pass the filter
plot(res0, type="PLE:density") # distribution of PLEs
plot(res0, type="PLE:waterfall") # PLE waterfall plot
plot(res0$submod.fit$mod) # Plot of subgroup model
res0$param.dat # overall/subgroup specific parameter estimates/inference
plot(res0) # Forest plot: overall/subgroup specific parameter estimates (CIs)
# Without filtering #
res1 = PRISM(Y=Y, A=A, X=X, filter=NULL)
plot(res1$submod.fit$mod)
plot(res1)
## With bootstrap (No filtering) ##
# }
# NOT RUN {
res_boot = PRISM(Y=Y, A=A, X=X, resample = "Bootstrap", R=50, verbose.resamp = TRUE)
# Plot of distributions and P(est>0) #
plot(res_boot, type="resample")+geom_vline(xintercept = 0)
aggregate(I(est>0)~Subgrps, data=res_boot$resamp.dist, FUN="mean")
# }
# NOT RUN {
# Survival Data ##
# }
# NOT RUN {
library(survival)
require(TH.data); require(coin)
data("GBSG2", package = "TH.data")
surv.dat = GBSG2
# Design Matrices ###
Y = with(surv.dat, Surv(time, cens))
X = surv.dat[,!(colnames(surv.dat) %in% c("time", "cens")) ]
set.seed(513)
A = rbinom( n = dim(X)[1], size=1, prob=0.5 )
# Default: PRISM: glmnet ==> MOB (Weibull) ==> Cox; bootstrapping posterior prob/inference #
res_weibull1 = PRISM(Y=Y, A=A, X=X, ple=NULL, resample="Bootstrap", R=100,
verbose.resamp = TRUE)
plot(res_weibull1$submod.fit$mod)
plot(res_weibull1)
plot(res_weibull1, type="resample")+geom_vline(xintercept = 1)
aggregate(I(est<1)~Subgrps, data=res_weibull1$resamp.dist, FUN="mean")
# PRISM: ENET ==> CTREE ==> Cox; bootstrapping for posterior prob/inference #
res_ctree1 = PRISM(Y=Y, A=A, X=X, ple=NULL, submod = "submod_ctree",
resample="Bootstrap", R=100, verbose.resamp = TRUE)
plot(res_ctree1$submod.fit$submod.fit$mod)
plot(res_ctree1)
plot(res_ctree1, type="resample")+geom_vline(xintercept = 1)
aggregate(I(est<1)~Subgrps, data=res_ctree1$resamp.dist, FUN="mean")
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab