ple_train: Patient-level Estimates: Train Model

Description

Wrapper function to train a patient-level estimate (ple) model. Used directly in PRISM and can be used to directly fit a ple model by name.

Usage

ple_train(
  Y,
  A,
  X,
  Xtest = NULL,
  family = "gaussian",
  propensity = FALSE,
  ple = "ranger",
  meta = ifelse(family == "survival", "T-learner", "X-learner"),
  hyper = NULL,
  tau = NULL,
  ...
)

Arguments

The outcome variable. Must be numeric or survival (ex; Surv(time,cens) )

Treatment variable. (Default supports binary treatment, either numeric or factor). "ple_train" accomodates >2 along with binary treatments.

Covariate space.

Xtest

Test set. Default is NULL (no test predictions). Variable types should match X.

family

Outcome type. Options include "gaussion" (default), "binomial", and "survival".

propensity

Propensity score estimation, P(A=a|X). Default=FALSE which use the marginal estimates, P(A=a) (applicable for RCT data). If TRUE, will use the "ple" base learner to estimate P(A=a|X).

ple

Base-learner used to estimate patient-level equantities, such as the conditional average treatment effect (CATE), E(Y|A=1,X)-E(Y|A=0, X) = CATE(X). Default is random based based through "ranger". "None" uses no ple. See below for details on estimating the treatment contrasts.

Value

Trained ple models and patient-level estimates for train/test sets.

mod - trained model(s)
mu_train - Patient-level estimates (training set)
mu_test - Patient-level estimates (test set)

Details

ple_train uses base-learners along with a meta-learner to obtain patient-level estimates under different treatment exposures (see Kunzel et al). For family="gaussian" or "binomial", output estimates of \(\mu(a,x)=E(Y|x,a)\) and treatment differences (average treatment effect or risk difference). For survival, either logHR based estimates or RMST based estimates can be obtained. Current base-learner ("ple") options include:

1. linear: Uses either linear regression (family="gaussian"), logistic regression (family="binomial"), or cox regression (family="survival"). No hyper-parameters.

2. ranger: Uses random forest ("ranger" R package). The default hyper-parameters are: hyper = list(mtry=NULL, min.node.pct=0.10)

where mtry is number of randomly selected variables (default=NULL; sqrt(dim(X))) and min.node.pct is the minimum node size as a function of the total data size (ex: min.node.pct=10% requires at least 10

3. glmnet: Uses elastic net ("glmnet" R package). The default hyper-parameters are: hyper = list(lambda="lambda.min")

where lambda controls the penalty parameter for predictions. lambda="lambda.1se" will likely result in a less complex model.

4. bart: Uses bayesian additive regression trees (Chipman et al 2010; BART R package). Default hyper-parameters are:

hyper = list(sparse=FALSE)

where sparse controls whether to perform variable selection based on a sparse Dirichlet prior rather than simply uniform.

References

Wright, M. N. & Ziegler, A. (2017). ranger: A fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 77:1-17. 10.18637/jss.v077.i01

Friedman, J., Hastie, T. and Tibshirani, R. (2008) Regularization Paths for Generalized Linear Models via Coordinate Descent, https://web.stanford.edu/~hastie/Papers/glmnet.pdf Journal of Statistical Software, Vol. 33(1), 1-22 Feb 2010 Vol. 33(1), 1-22 Feb 2010.

Chipman, H., George, E., and McCulloch R. (2010) Bayesian Additive Regression Trees. The Annals of Applied Statistics, 4,1, 266-298

Kunzel S, Sekhon JS, Bickel PJ, Yu B. Meta-learners for Estimating Hetergeneous Treatment Effects using Machine Learning. 2019.

Examples

Run this code

# NOT RUN {
library(StratifiedMedicine)
## Continuous ##
dat_ctns = generate_subgrp_data(family="gaussian")
Y = dat_ctns$Y
X = dat_ctns$X
A = dat_ctns$A


# X-Learner (With ranger based learners)
mod1 = ple_train(Y=Y, A=A, X=X, Xtest=X, ple="ranger", method="X-learner")
summary(mod1$mu_train)

# T-Learner (Treatment specific)
mod2 = ple_train(Y=Y, A=A, X=X, Xtest=X, ple="ranger", method="T-learner")
summary(mod2$mu_train)


mod3 = ple_train(Y=Y, A=A, X=X, Xtest=X, ple="bart", method="X-learner")
summary(mod3$mu_train)
# }
# NOT RUN {

# }

Run the code above in your browser using DataLab