Learn R Programming

stremr (version 0.4)

fitSeqGcomp: Fit sequential GCOMP and TMLE for survival

Description

Interventions on up to 3 nodes are allowed: CENS, TRT and MONITOR. TMLE adjustment will be based on the inverse of the propensity score fits for the observed likelihood (g0.C, g0.A, g0.N), multiplied by the indicator of not being censored and the probability of each intervention in intervened_TRT and intervened_MONITOR. Requires column name(s) that specify the counterfactual node values or the counterfactual probabilities of each node being 1 (for stochastic interventions).

Usage

fitSeqGcomp(OData, t_periods, Qforms, Qstratify = NULL, intervened_TRT = NULL, intervened_MONITOR = NULL, useonly_t_TRT = NULL, useonly_t_MONITOR = NULL, rule_name = paste0(c(intervened_TRT, intervened_MONITOR), collapse = ""), stratifyQ_by_rule = FALSE, TMLE = FALSE, iterTMLE = FALSE, IPWeights = NULL, stabilize = FALSE, trunc_weights = 10^6, params_Q = list(), weights = NULL, max_iter = 15, adapt_stop = TRUE, adapt_stop_factor = 10, tol_eps = 0.001, parallel = FALSE, verbose = getOption("stremr.verbose"))

Arguments

OData
Input data object created by importData function.
t_periods
Specify the vector of time-points for which the survival function (and risk) should be estimated
Qforms
Regression formulas, one formula per Q. Only main-terms are allowed.
Qstratify
Placeholder for future user-defined model stratification for all Qs (CURRENTLY NOT FUNCTIONAL, WILL RESULT IN ERROR)
intervened_TRT
Column name in the input data with the probabilities (or indicators) of counterfactual treatment nodes being equal to 1 at each time point. Leave the argument unspecified (NULL) when not intervening on treatment node(s).
intervened_MONITOR
Column name in the input data with probabilities (or indicators) of counterfactual monitoring nodes being equal to 1 at each time point. Leave the argument unspecified (NULL) when not intervening on the monitoring node(s).
useonly_t_TRT
Use for intervening only on some subset of observation and time-specific treatment nodes. Should be a character string with a logical expression that defines the subset of intervention observations. For example, using TRT==0 will intervene only at observations with the value of TRT being equal to zero. The expression can contain any variable name that was defined in the input dataset. Leave as NULL when intervening on all observations/time-points.
useonly_t_MONITOR
Same as useonly_t_TRT, but for monitoring nodes.
rule_name
Optional name for the treatment/monitoring regimen.
stratifyQ_by_rule
Set to TRUE for stratification, fits the outcome model (Q-learning) among rule-followers only. Setting to FALSE will fit the outcome model (Q-learning) across all observations (pooled regression).
TMLE
Set to TRUE to run the usual longitudinal TMLE algorithm (with a separate TMLE update of Q for every sequential regression).
iterTMLE
Set to TRUE to run the iterative univariate TMLE instead of the usual longitudinal TMLE. When set to TRUE this will also provide the standard sequential Gcomp as party of the output. Must set TMLE=FALSE when setting this to TRUE.
IPWeights
(Optional) result of calling function getIPWeights for running TMLE (evaluated automatically when missing)
stabilize
Set to TRUE to use stabilized weights for the TMLE
trunc_weights
Specify the numeric weight truncation value. All final weights exceeding the value in trunc_weights will be truncated.
params_Q
Optional parameters to be passed to the specific fitting algorithm for Q-learning
weights
Optional data.table with additional observation-time-specific weights. Must contain columns ID, t and weight. The column named weight is merged back into the original data according to (ID, t).
max_iter
For iterative TMLE only: Integer, set to maximum number of iterations for iterative TMLE algorithm.
adapt_stop
For iterative TMLE only: Choose between two stopping criteria for iterative TMLE, default is TRUE, which will stop the iterative TMLE algorithm in an adaptive way. Specifically, the iterations will stop when the mean estimate of the efficient influence curve is less than or equal to 1 / (adapt_stop_factor*sqrt(N)), where N is the total number of unique subjects in data and adapt_stop_factor is set to 10 by default. When TRUE, the argument tol_eps is ignored and TMLE stops when either max_iter has been reached or this criteria has been satisfied. When FALSE, the stopping criteria is determined by values of max_iter and tol_eps.
adapt_stop_factor
For iterative TMLE only: The adaptive factor to choose the stopping criteria for iterative TMLE when adapt_stop is set to TRUE. Default is 10. TMLE will keep iterative until the mean estimate of the efficient influence curve is less than 1 / (adapt_stop_factor*sqrt(N)) or when the number of iterations is max_iter.
tol_eps
For iterative TMLE only: Numeric error tolerance for the iterative TMLE update. The iterative TMLE algorithm will stop when the absolute value of the TMLE intercept update is below tol_eps
parallel
Set to TRUE to run the sequential Gcomp or TMLE in parallel (uses foreach with dopar and requires a previously defined parallel back-end cluster)
verbose
...

Value

...

See Also

stremr-package for the general overview of the package,

Examples

Run this code
options(stremr.verbose = TRUE)
require("data.table")
set_all_stremr_options(fit.package = "speedglm", fit.algorithm = "glm")

# ----------------------------------------------------------------------
# Simulated Data
# ----------------------------------------------------------------------
data(OdataNoCENS)
OdataDT <- as.data.table(OdataNoCENS, key=c(ID, t))

# define lagged N, first value is always 1 (always monitored at the first time point):
OdataDT[, ("N.tminus1") := shift(get("N"), n = 1L, type = "lag", fill = 1L), by = ID]
OdataDT[, ("TI.tminus1") := shift(get("TI"), n = 1L, type = "lag", fill = 1L), by = ID]

# ----------------------------------------------------------------------
# Define intervention (always treated):
# ----------------------------------------------------------------------
OdataDT[, ("TI.set1") := 1L]
OdataDT[, ("TI.set0") := 0L]

# ----------------------------------------------------------------------
# Import Data
# ----------------------------------------------------------------------
OData <- importData(OdataDT, ID = "ID", t = "t", covars = c("highA1c", "lastNat1", "N.tminus1"),
                    CENS = "C", TRT = "TI", MONITOR = "N", OUTCOME = "Y.tplus1")

# ----------------------------------------------------------------------
# Model the Propensity Scores
# ----------------------------------------------------------------------
gform_CENS <- "C ~ highA1c + lastNat1"
gform_TRT = "TI ~ CVD + highA1c + N.tminus1"
gform_MONITOR <- "N ~ 1"
stratify_CENS <- list(C=c("t < 16", "t == 16"))

# ----------------------------------------------------------------------
# Fit Propensity Scores
# ----------------------------------------------------------------------
OData <- fitPropensity(OData, gform_CENS = gform_CENS,
                        gform_TRT = gform_TRT,
                        gform_MONITOR = gform_MONITOR,
                        stratify_CENS = stratify_CENS)

# ----------------------------------------------------------------------
# IPW Ajusted KM or Saturated MSM
# ----------------------------------------------------------------------
require("magrittr")
AKME.St.1 <- getIPWeights(OData, intervened_TRT = "TI.set1") %>%
             survNPMSM(OData) %$%
             IPW_estimates
AKME.St.1

# ----------------------------------------------------------------------
# Bounded IPW
# ----------------------------------------------------------------------
IPW.St.1 <- getIPWeights(OData, intervened_TRT = "TI.set1") %>%
             survDirectIPW(OData)
IPW.St.1[]

# ----------------------------------------------------------------------
# IPW-MSM for hazard
# ----------------------------------------------------------------------
wts.DT.1 <- getIPWeights(OData = OData, intervened_TRT = "TI.set1", rule_name = "TI1")
wts.DT.0 <- getIPWeights(OData = OData, intervened_TRT = "TI.set0", rule_name = "TI0")
survMSM_res <- survMSM(list(wts.DT.1, wts.DT.0), OData, t_breaks = c(1:8,12,16)-1,)
survMSM_res$St

# ----------------------------------------------------------------------
# Sequential G-COMP
# ----------------------------------------------------------------------
t.surv <- c(0:15)
Qforms <- rep.int("Q.kplus1 ~ CVD + highA1c + N + lastNat1 + TI + TI.tminus1", (max(t.surv)+1))
params = list(fit.package = "speedglm", fit.algorithm = "glm")

## Not run: 
# gcomp_est <- fitSeqGcomp(OData, t_periods = t.surv, intervened_TRT = "TI.set1",
#                           Qforms = Qforms, params_Q = params, stratifyQ_by_rule = FALSE)
# gcomp_est[]
# ## End(Not run)
# ----------------------------------------------------------------------
# TMLE
# ----------------------------------------------------------------------
## Not run: 
# tmle_est <- fitTMLE(OData, t_periods = t.surv, intervened_TRT = "TI.set1",
#                     Qforms = Qforms, params_Q = params, stratifyQ_by_rule = TRUE)
# tmle_est[]
# ## End(Not run)

# ----------------------------------------------------------------------
# Running IPW-Adjusted KM with optional user-specified weights:
# ----------------------------------------------------------------------
addedWts_DT <- OdataDT[, c("ID", "t"), with = FALSE]
addedWts_DT[, new.wts := sample.int(10, nrow(OdataDT), replace = TRUE)/10]
survNP_res_addedWts <- survNPMSM(wts.DT.1, OData, weights = addedWts_DT)

# ----------------------------------------------------------------------
# Multivariate Propensity Score Regressions
# ----------------------------------------------------------------------
gform_CENS <- "C + TI + N ~ highA1c + lastNat1"
OData <- fitPropensity(OData, gform_CENS = gform_CENS, gform_TRT = gform_TRT,
                        gform_MONITOR = gform_MONITOR)

# ----------------------------------------------------------------------
# Fitting Propensity scores with Random Forests:
# ----------------------------------------------------------------------
## Not run: 
# set_all_stremr_options(fit.package = "h2o", fit.algorithm = "randomForest")
# require("h2o")
# h2o::h2o.init(nthreads = -1)
# gform_CENS <- "C ~ highA1c + lastNat1"
# OData <- fitPropensity(OData, gform_CENS = gform_CENS,
#                         gform_TRT = gform_TRT,
#                         gform_MONITOR = gform_MONITOR,
#                         stratify_CENS = stratify_CENS)
# 
# # For Gradient Boosting machines:
# set_all_stremr_options(fit.package = "h2o", fit.algorithm = "gbm")
# # Use `H2O-3` distributed implementation of GLM
# set_all_stremr_options(fit.package = "h2o", fit.algorithm = "glm")
# # Use Deep Neural Nets:
# set_all_stremr_options(fit.package = "h2o", fit.algorithm = "deeplearning")
# ## End(Not run)

# ----------------------------------------------------------------------
# Fitting different models with different algorithms
# Fine tuning modeling with optional tuning parameters.
# ----------------------------------------------------------------------
## Not run: 
# params_TRT = list(fit.package = "h2o", fit.algorithm = "gbm", ntrees = 50,
#     learn_rate = 0.05, sample_rate = 0.8, col_sample_rate = 0.8,
#     balance_classes = TRUE)
# params_CENS = list(fit.package = "speedglm", fit.algorithm = "glm")
# params_MONITOR = list(fit.package = "speedglm", fit.algorithm = "glm")
# OData <- fitPropensity(OData,
#             gform_CENS = gform_CENS, stratify_CENS = stratify_CENS, params_CENS = params_CENS,
#             gform_TRT = gform_TRT, params_TRT = params_TRT,
#             gform_MONITOR = gform_MONITOR, params_MONITOR = params_MONITOR)
# ## End(Not run)

# ----------------------------------------------------------------------
# Running TMLE based on the previous fit of the propensity scores.
# Also applying Random Forest to estimate the sequential outcome model
# ----------------------------------------------------------------------
## Not run: 
# t.surv <- c(0:5)
# Qforms <- rep.int("Q.kplus1 ~ CVD + highA1c + N + lastNat1 + TI + TI.tminus1", (max(t.surv)+1))
# params_Q = list(fit.package = "h2o", fit.algorithm = "randomForest",
#                 ntrees = 100, learn_rate = 0.05, sample_rate = 0.8,
#                 col_sample_rate = 0.8, balance_classes = TRUE)
# tmle_est <- fitTMLE(OData, t_periods = t.surv, intervened_TRT = "TI.set1",
#             Qforms = Qforms, params_Q = params_Q,
#             stratifyQ_by_rule = TRUE)
# ## End(Not run)

## Not run: 
# t.surv <- c(0:5)
# Qforms <- rep.int("Q.kplus1 ~ CVD + highA1c + N + lastNat1 + TI + TI.tminus1", (max(t.surv)+1))
# params_Q = list(fit.package = "h2o", fit.algorithm = "randomForest",
#                 ntrees = 100, learn_rate = 0.05, sample_rate = 0.8,
#                 col_sample_rate = 0.8, balance_classes = TRUE)
# tmle_est <- fitTMLE(OData, t_periods = t.surv, intervened_TRT = "TI.set1",
#             Qforms = Qforms, params_Q = params_Q,
#             stratifyQ_by_rule = FALSE)
# ## End(Not run)

Run the code above in your browser using DataLab