ltmle: Longitudinal Targeted Maximum Likelihood Estimation

Description

ltmle is Targeted Maximum Likelihood Estimation (TMLE) of treatment/censoring specific mean outcome for point-treatment and longitudinal data. ltmleMSM adds Marginal Structural Models. Both always provide Inverse Probability of Treatment/Censoring Weighted estimate (IPTW) as well. Maximum likelihood based G-computation estimate (G-comp) can be obtained instead of TMLE. ltmle can be used to calculate additive treatment effect, risk ratio, and odds ratio.

Usage

ltmle(data, Anodes, Cnodes=NULL, Lnodes=NULL, Ynodes, survivalOutcome=NULL, Qform=NULL, 
 gform=NULL, abar, rule=NULL, gbounds=c(0.01, 1), Yrange=NULL, 
 deterministic.g.function=NULL, stratify=FALSE, SL.library=NULL, 
 estimate.time=nrow(data) > 50, gcomp=FALSE, mhte.iptw=FALSE, iptw.only=FALSE, 
 deterministic.Q.function=NULL)
ltmleMSM(data, Anodes, Cnodes=NULL, Lnodes=NULL, Ynodes, survivalOutcome=NULL, Qform=NULL,
 gform=NULL, gbounds=c(0.01, 1), Yrange=NULL, deterministic.g.function=NULL, 
 SL.library=NULL, regimes, working.msm, summary.measures, 
 summary.baseline.covariates=NULL, final.Ynodes=NULL, pooledMSM=TRUE, stratify=FALSE, 
 weight.msm=TRUE, estimate.time=nrow(data) > 50, gcomp=FALSE, mhte.iptw=FALSE, 
 iptw.only=FALSE, deterministic.Q.function=NULL, memoize=TRUE)

Arguments

data

data frame following the time-ordering of the nodes. See 'Details'.

Anodes

column names or indicies in data of treatment nodes

Cnodes

column names or indicies in data of censoring nodes

Lnodes

column names or indicies in data of time-dependent covariate nodes

Ynodes

column names or indicies in data of outcome nodes

survivalOutcome

If TRUE, then Y nodes are indicators of an event, and if Y at some time point is 1, then all following should be 1. Required to be TRUE or FALSE if outcomes are binary and there are multiple Ynodes.

Qform

character vector of regression formulas for $Q$. See 'Details'.

gform

character vector of regression formulas for $g$. See 'Details'.

abar

binary vector (numAnodes x 1) or matrix (n x numAnodes) of counterfactual treatment

rule

a function to be applied to each row (a named vector) of data that returns a numeric vector of length numAnodes

gbounds

lower and upper bounds on estimated probabilities for g-factors. Vector of length 2, order unimportant.

Yrange

NULL or a numerical vector where the min and max of Yrange specify the range of all Y nodes. See 'Details'.

deterministic.g.function

optional information on A and C nodes that are given deterministically. See 'Details'. Default NULL indicates no deterministic links.

stratify

if TRUE stratify on following abar when estimating Q and g. If FALSE, pool over abar.

SL.library

optional character vector of libraries to pass to SuperLearner. NULL indicates glm should be called instead of

estimate.time

if TRUE, run an initial estimate using only 50 observations and use this to print a very rough estimate of the total time to completion. No action if there are fewer than 50 observations.

gcomp

if TRUE, run the maximum likelihood based G-computation estimate instead of TMLE

regimes

binary array: n x numAnodes x numRegimes of counterfactual treatment or a list of 'rule' functions

working.msm

character formula for the working marginal structural model

summary.measures

array: num.regimes x num.summary.measures x num.final.Ynodes - measures summarizing the regimes that will be used on the right hand side of working.msm

summary.baseline.covariates

NOT FULLY IMPLEMENTED YET - leave as NULL (default)

final.Ynodes

vector subset of Ynodes - used in MSM to pool over a set of outcome nodes

pooledMSM

if TRUE, the TMLE targeted step will pool across regimes

weight.msm

if TRUE, the working.msm will be weighted by the empirical probability of each regime [in the future more flexible weighting may be possible]

mhte.iptw

if TRUE, IPTW is calculated using the modified Horvitz-Thompson estimator (normalizes by sum of the inverse weights)

iptw.only

by default (iptw.only = FALSE), both TMLE and IPTW are run in ltmle and ltmleMSM(pooledMSM=FALSE). If iptw.only = TRUE, only IPTW is run, which is faster. This parameter is not used in ltmleMSM(poo

deterministic.Q.function

optional information on Q given deterministically. See 'Details'. Default NULL indicates no deterministic links.

memoize

If TRUE, glm regressions will be memoized. It is recommended to leave this as TRUE (the default), especially if there are multiple final.Ynodes, because the code is not written as efficiently as it should be and will

Value

ltmle returns an object of class "ltmle" The function summary (i.e. summary.ltmle) can be used to obtain or print a summary of the results. An object of class "ltmle" is a list containing the following components:
estimates
a named vector of length 4 with elements, each an estimate of $E[Y_{bar{a}}]$:
- tmle- Targeted Maximum Likelihood Estimate [NULL ifgcompisTRUE]
- iptw- Inverse Probability of Treatment/Censoring Weighted estimate
- gcomp- maximum likelihood based G-computation estimate [NULL ifgcompisFALSE]
- naive- naive estimate$E[Y|a=\bar{a}]$
ICa list with the following components of Influence Curve values
- tmle- vector of influence curve values for Targeted Maximum Likelihood Estimate [NULL ifgcompisTRUE]
- iptw- vector of influence curve values for Inverse Probability of Treatment/Censoring Weighted estimate
- gcomp- vector of influence curve values for Targeted Maximum Likelihood Estimate without updating [NULL ifgcompisFALSE]
cum.gmatrix, n x numACnodes - cumulative g, after bounding
callthe matched call
gcompthe gcomp input
formulasa list with elements Qform and gform
fita list with the following components
- g- list of length numACnodes -glmorSuperLearnerreturn objects from fitting g regressions
- Q- list of length numLYnodes -glmorSuperLearnerreturn objects from fitting Q regressions
- Qstar- list of length numLYnodes -glm(or numerical optimization ifglmfails to solve the score equation) return objects from updating the Q fit
ltmleMSM returns an object of class "ltmleMSM" The function summary (i.e. summary.ltmleMSM) can be used to obtain or print a summary of the results. An object of class "ltmleMSM" is a list containing the following components:
betaparameter estimates for working.msm using TMLE (GCOMP if gcomp input is TRUE)
beta.iptwparameter estimates for working.msm using IPTW (NULL if pooledMSM is TRUE)
ICmatrix, n x numBetas - influence curve values for TMLE (without updating if gcomp input is TRUE)
IC.iptwmatrix, n x numBetas - influence curve values for IPTW (NULL if pooledMSM is TRUE)
msmobject of class glm - the result of fitting the working.msm
cum.garray, n x numACnodes x numRegimes - cumulative g, after bounding
callthe matched call
gcompthe gcomp input
formulasa list with elements Qform and gform
fita list with the following components
- g- list of length numRegimes of list of length numACnodes -glmorSuperLearnerreturn objects from fitting g regressions
- Q- list of length numLYnodes -glmorSuperLearnerreturn objects from fitting Q regressions
- Qstar- list of length numLYnodes -glm(or numerical optimization ifglmfails to solve the score equation) return objects from updating the Q fit

Details

The estimates returned by ltmle are of a treatment specific mean, $E[Y_{\bar{a}}]$, the mean of the final treatment node, where all treatment nodes, $A$, are set to $\bar{a}$ (abar) and all censoring nodes $C$ are set to 1 (uncensored). The estimates returned by ltmleMSM are similar but are the parameters in a working marginal structural model. By calling ltmle twice, using two different values of abar, additive treatment effect, risk ratio, and odds ratio can be computed using summary.ltmle. data should be a data frame where the order of the columns corresponds to the time-ordering of the model.

in censoring columns (Cnodes): factor with two levels: "censored" and "uncensored". The helper functionCensoringToBinarycan be used to create these factors.
in treatment columns (Anodes): 1 = treated, 0 = untreated (must be binary)
in event columns (Ynodes): IfsurvivalOutcomeisTRUE, then Y nodes are treated as indicators of a one-time event. See details forsurvivalOutocme. IfsurvivalOutcomeisFALSE, Y nodes are treated as binary if all values are 0 or 1, and are treated as continuous otherwise. If Y nodes are continuous, they may be automatically scaled. See details forYrange.
time-dependent covariate columns (Lnodes): can be any numeric data
Data inCnodes,Anodes,LnodesandYnodesare not used after (to the right of) censoring (or an event whensurvivalOutcome==TRUE) and may be coded asNAor any other value.
Columns indatathat are before (to the left of) the first ofCnodesorAnodesare treated as baseline variables, even if they are specified asLnodes.
After the first ofCnodes,Anodes,Ynodes, orLnodes, every column must be in one ofCnodes,Anodes,Ynodes, orLnodes.

If survivalOutcome is TRUE, all Y values are indicators of an event (e.g. death) at or before the current time, where 1 = event and 0 = no event. The events in Ynodes must be of the form where once Y jumps to 1, Y remains 1 at subsequent nodes. For continuous outcomes, (survivalOutcome==FALSE and some Y nodes are not 0 or 1,) Y values are truncated at the minimum and maximum of Yrange if specified, and then transformed and scaled to be in [0,1]. That is, transformed to (Y-min(Yrange))/(max(Yrange)-min(Yrange)). If Yrange is NULL, it is set to the range of all Y nodes. In that case, Y nodes are only scaled if any values fall outside of [0,1]. For intervention specific means (ltmle), parameter estimates are transformed back based Yrange. Qform should be NULL, in which case all parent nodes of each L and Y node will be used as regressors, or a named character vector that can be coerced to class "formula". The length of Qform must be equal to length(Lnodes) + length(Ynodes)** and the names and order of the formulas must be the same as the names and order of the L and Y nodes in data. The left hand side of each formula should be "Q.kplus1". If SL.library is NULL, glm will be called using the elements of Qform. If SL.library is specified, SuperLearner will be called and all variables appearing on the right hand side of a formula will be passed to SuperLearner. The actual functional form of the formula is unimportant if SuperLearner is called. ** If there is a "block" of L and Y nodes not separated by A or C nodes, only one regression is required at the first L/Y node in a block. You can pass regression formulas for the other L/Y nodes, but they will be ignored (with a message). See example 5. gform should be NULL, in which case all parent nodes of each L and Y node will be used as regressors, or a character vector that can be coerced to class "formula". The length of gform must be equal to length(Anodes) + length(Cnodes) and the order of the formulas must be the same as the order the A and C nodes appear in data. The left hand side of each formula should be the name of the Anode or Cnode. If SL.library is NULL, glm will be called using the elements of gform. If SL.library is specified, SuperLearner will be called and all variables appearing on the right hand side of a formula will be passed to SuperLearner. The actual functional form of the formula is unimportant if SuperLearner is called. abar specifies the counterfactual values of the Anodes, using the order they appear in data and should have the same length (if abar is a vector) or number of columns (if abar is a matrix) as Anodes.

rule can be used to specify a dynamic treatment rule. rule is a function applied to each row of data which returns the a numeric vector of the same length as Anodes.

regimes can be a binary array: n x numAnodes x numRegimes of counterfactual treatment or a list of 'rule' functions as described above for the rule parameter for the ltmle function deterministic.g.function can be a function used to specify model knowledge about value of Anodes and/or Cnodes that are set deterministically. For example, it may be the case that once a patient starts treatment, they always stay on treatment. For details on the form of the function and examples, see deterministic.g.function_template

deterministic.Q.function can be a function used to specify model knowledge about the final event state. For example, it may be the case that a patient can complete the study at some intermediate time point, in which case the probability of death is 0 (assuming they have not died already). For details on the form of the function and examples, see deterministic.Q.function_template

SL.library may be a character vector of libraries (or NULL or 'default'), in which case these libraries are used to estimate both $Q$ and $g$ OR a list with two components, Q and g, where each is a character vector of libraries (or NULL or 'default'). NULL indicates glm should be called instead of SuperLearner If SL.library is the string 'default', SL.library is set to list("SL.glm", "SL.stepAIC", "SL.bayesglm", c("SL.glm", "screen.corP"), c("SL.step", "screen.corP"), c("SL.step.forward", "screen.corP"), c("SL.stepAIC", "screen.corP"), c("SL.step.interaction", "screen.corP"), c("SL.bayesglm", "screen.corP"). Note that the default set of libraries consists of main terms models. It may be advisable to include squared terms, interaction terms, etc in data or include libraries that consider non-linear terms.

The print method for ltmle objects only prints the tmle estimates.

References

Lendle, Samuel, Schwab, Joshua, Petersen, Maya and and van der Laan, Mark J "ltmle: An R Package Implementing Targeted Minimum Loss-based Estimation for Longitudinal Data", Forthcoming Petersen, Maya, Schwab, Joshua and van der Laan, Mark J, "Targeted Maximum Likelihood Estimation of Marginal Structural Working Models for Dynamic Treatments Time-Dependent Outcomes", Forthcoming

van der Laan, Mark J. and Gruber, Susan, "Targeted Minimum Loss Based Estimation of an Intervention Specific Mean Outcome" (August 2011). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 290. http://biostats.bepress.com/ucbbiostat/paper290

van der Laan, Mark J. and Rose, Sherri, "Targeted Learning: Causal Inference for Observational and Experimental Data" New York: Springer, 2011.

Examples

Run this code

rexpit <- function(x) rbinom(n=length(x), size=1, prob=plogis(x))
                 
# Example 1: Single time point example. True value of E[Y_1] (expected value of
#   Y setting A to 1) is approximately 0.5939.
set.seed(2)
n <- 1000
W1 <- rnorm(n)
W2 <- rbinom(n, size=1, prob=0.3)   
W3 <- rnorm(n)
A <- rexpit(-1 + 2 * W1^2)
Y <- rexpit(-0.5 + 2 * W1^2 + 0.5 * W2 - 0.5 * A + 0.2 * W3 * A 
       - 1.1 * W3 + 0.2 * rnorm(n))

data <- data.frame(W1, W2, W3, A, Y)


# The SuperLearner examples are a little slow.
library(SuperLearner)

#SuperLearner semiparametric estimation using all parents as regressors 
result1 <- ltmle(data, Anodes="A", Lnodes=NULL, Ynodes="Y", abar=1, 
  SL.library=c("SL.glm", "SL.step", "SL.mean"))
summary(result1)
summary(result1, estimator="iptw")

#SuperLearner semiparametric estimation using (incorrectly) specified regressors
#note: The functional form for Qform and gform is unimportant if 
# using SuperLearner - see 'Details'
result1a <- ltmle(data, Anodes="A", Lnodes=NULL, Ynodes="Y", 
 Qform=c(Y="Q.kplus1 ~ W1 + W3 + A"), gform="A ~ W1", abar=1, 
 SL.library=c("SL.glm", "SL.step", "SL.mean"))
summary(result1a)

#glm using correctly specified Qform and gform
result.abar1 <- ltmle(data, Anodes="A", Lnodes=NULL, Ynodes="Y", 
 Qform=c(Y="Q.kplus1 ~ I(W1^2) + W2 + W3*A"), gform="A ~ I(W1^2)", 
 abar=1, SL.library=NULL)

#Get summary measures (additive treatment effect, odds ratio, relative risk) 
#  for abar=1 vs abar=0
result.abar0 <- ltmle(data, Anodes="A", Lnodes=NULL, Ynodes="Y", 
 Qform=c(Y="Q.kplus1 ~ I(W1^2) + W2 + W3*A"), gform="A ~ I(W1^2)", 
 abar=0, SL.library=NULL)
summary(result.abar1, result.abar0)


# Example 2: Longitudinal example. Includes informative censoring and treatment. 
# Time ordering of data is W, C1, L1, A1, Y1, C2, L2, A2, Y2
# True value of E[Y_(1,1,1,1)] (expected value of Y setting C1, A1, C2, A2 all to 1)
#  is approximately 0.413.
# A1 is known to always be 1 if L1 < -2, and is 1 with probability 0.1 if L1 > 2 
# A2 is known to always be 1 if A1 is 1 
# We incorporate this knowledge using deterministic.g.function

# Generate data:
ua <- rep(TRUE, n)   #ua = uncensored and alive
L1 <- A1 <- Y1 <- C2.binary <- L2 <- A2 <- Y2 <- as.numeric(rep(NA, n))
W <- rnorm(n)
C1 <- BinaryToCensoring(is.uncensored=rexpit(2 + W))
ua <- ua & C1 == "uncensored"
L1[ua] <- rnorm(n)[ua] + W[ua]
A1[ua] <- rexpit(L1[ua])
A1[ua & L1 < -2] <- 1
A1[ua & L1 >  2] <- rbinom(n, size=1, prob=0.1)[ua & L1 >  2]
Y1[ua] <- rexpit((W + L1 - A1)[ua])
ua <- ua & !Y1
C2.binary[ua] <- rexpit((1 + 0.7 * L1 - A1)[ua])
C2 <- BinaryToCensoring(is.uncensored=C2.binary)
ua <- ua & C2 == "uncensored"
L2[ua] <- (0.5 * L1 - 0.9 * A1 + rnorm(n))[ua]
A2[ua] <- rexpit((0.5 * L1 + 0.8 * L2)[ua]) | A1[ua]
Y2[ua] <- rexpit((0.7 * L1 + L2 - 0.8 * A1 - A2)[ua])
Y2[Y1 == 1] <- 1  # if a patient dies at time 1, record death at time 2 as well
data <- data.frame(W, C1, L1, A1, Y1, C2, L2, A2, Y2)

deterministic.g.function <- function(data, current.node, nodes) {
  if (names(data)[current.node] == "A1") {
    det <- (data$L1 < -2 | data$L1 > 2) & !is.na(data$L1)
    prob1 <- ((data$L1 < -2) * 1 + (data$L1 > 2) * 0.1)[det]
  } else if (names(data)[current.node] == "A2") {
    det <- data$A1 == 1 & !is.na(data$A1)
    prob1 <- 1
  } else if (names(data[current.node]) %in% c("C1", "C2")){
    return(NULL)  #this returns the default of no deterministic links 
    #note that it is not necessary to specify that prior censoring indicates future censoring
  } else {
    stop("unexpected current.node")
  }
  return(list(is.deterministic=det, prob1=prob1))  
}

result2 <- ltmle(data, Anodes=c("A1","A2"), Cnodes=c("C1", "C2"), 
                Lnodes=c("L1", "L2"), Ynodes=c("Y1", "Y2"), abar=c(1, 1), 
                deterministic.g.function=deterministic.g.function, survivalOutcome=TRUE)
summary(result2) 
 
# Example 3: Dynamic treatment
# W -> A1 -> L -> A2 -> Y
# Treatment regime of interest is: Always treat at time 1 (A1 = 1), 
#   treat at at time 2 (A2 = 1), iff L > 0
# True value of E[Y_d] is approximately 0.346

n <- 1000
W <- rnorm(n)
A1 <- rexpit(W)
L <- 0.3 * W + 0.2 * A1 + rnorm(n)
A2 <- rexpit(W + A1 + L)
Y <- rexpit(W - 0.6 * A1 + L - 0.8 * A2)
data <- data.frame(W, A1, L, A2, Y)

abar <- matrix(nrow=n, ncol=2)
abar[, 1] <- 1
abar[, 2] <- L > 0

result3 <- ltmle(data, Anodes=c("A1", "A2"), Lnodes="L", Ynodes="Y", 
  survivalOutcome=TRUE, abar=abar)
summary(result3)

# Example 3.1: The regime can also be specified as a rule function

rule <- function(row) c(1, row["L"] > 0)

result.rule <- ltmle(data, Anodes=c("A1", "A2"), Lnodes="L", Ynodes="Y", 
  survivalOutcome=TRUE, rule=rule)
# This should be the same as the above result
summary(result.rule)

# Example 4: Deterministic Q function
# W -> A1 -> Y1 -> L2 -> A2 -> Y2
n <- 200
L2 <- A2 <- Y2 <- as.numeric(rep(NA, n))
W <- rnorm(n)
A1 <- rexpit(W)
Y1 <- rexpit(W - A1)
alive <- Y1 == 0
L2[alive] <- (0.5 * W - 0.9 * A1 + rnorm(n))[alive]
completed.study <- alive & L2 > 0

#Specify that Q is deterministically 0 when L2 is in the history of the 
# current Q regression and L2 > 0
#Note 1: det.Q.fun doesn't condition on called.from.estimate.g so g will also be set 
#        deterministically after L2 > 0 
#Note 2: It is not necessary to specify that Q is deterministically 1 if Y1 is 1; this is automatic
det.Q.fun.4a <- function(data, current.node, nodes, called.from.estimate.g) {
  L2.index <- which(names(data) == "L2")
  stopifnot(length(L2.index) == 1)
  L2.in.history <- L2.index < current.node
  if (! L2.in.history) return(NULL)
  
  is.deterministic <- data$L2 > 0 & !is.na(data$L2)
  return(list(is.deterministic=is.deterministic, Q.value=0))
}

#patients don't change treatment after leaving study; leave their A2 as NA
A2[alive & !completed.study] <- rexpit((0.5 * W + 0.8 * L2)[alive & !completed.study])

Y2[alive & !completed.study] <- rexpit((L2 - 0.8 * A1 - A2)[alive & !completed.study])
Y2[alive & completed.study] <- 0
Y2[!alive] <- 1  # if a patient dies at time 1, record death at time 2 as well
data <- data.frame(W, A1, Y1, L2, A2, Y2)

result4a <- ltmle(data, Anodes=c("A1","A2"), Lnodes="L2", Ynodes=c("Y1", "Y2"), abar=c(1, 1), 
 SL.library=NULL, estimate.time=FALSE, deterministic.Q.function=det.Q.fun.4a, survivalOutcome=TRUE)
#note: You will get the same result if you pass Lnodes=NULL (see next example)
summary(result4a)

#In this variation, suppose that treatment can still change after a patient leaves the study

det.Q.fun.4b <- function(data, current.node, nodes, called.from.estimate.g) {
  #there is no deterministic information when calculating g - treatment may still change
  if (called.from.estimate.g) return(NULL)  
  
  L2.index <- which(names(data) == "L2")
  stopifnot(length(L2.index) == 1)
  L2.in.history <- L2.index < current.node
  if (! L2.in.history) return(NULL)
  
  is.deterministic <- data$L2 > 0 & !is.na(data$L2)
  return(list(is.deterministic=is.deterministic, Q.value=0))
}

A2[alive] <- rexpit((0.5 * W + 0.8 * L2)[alive])  #patients can change treatment after leaving study
Y2[alive & !completed.study] <- rexpit((L2 - 0.8 * A1 - A2)[alive & !completed.study])
Y2[alive & completed.study] <- 0
Y2[!alive] <- 1  # if a patient dies at time 1, record death at time 2 as well
data <- data.frame(W, A1, Y1, L2, A2, Y2)

result4b <- ltmle(data, Anodes=c("A1","A2"), Lnodes="L2", Ynodes=c("Y1", "Y2"), abar=c(1, 1), 
 SL.library=NULL, estimate.time=FALSE, deterministic.Q.function=det.Q.fun.4b, survivalOutcome=TRUE) 
summary(result4b)

# Example 5: Multiple time-dependent covariates and treatments at each time point, 
#            continuous Y values
# age -> gender -> A1 -> L1a -> L1b -> Y1 -> A2 -> L2a -> L2b -> Y2
n <- 100
age <- rbinom(n, 1, 0.5)
gender <- rbinom(n, 1, 0.5)
A1 <- rexpit(age + gender)
L1a <- 2*age - 3*gender + 2*A1 + rnorm(n)
L1b <- rexpit(age + 1.5*gender - A1)
Y1 <- plogis(age - gender + L1a + 0.7*L1b + A1 + rnorm(n))
A2 <- rexpit(age + gender + A1 - L1a - L1b)
L2a <- 2*age - 3*gender + 2*A1 + A2 + rnorm(n)
L2b <- rexpit(age + 1.5*gender - A1 - A2)
Y2 <- plogis(age - gender + L1a + L1b + A1 + 1.8*A2 + rnorm(n))
data <- data.frame(age, gender, A1, L1a, L1b, Y1, A2, L2a, L2b, Y2)

#also show some different ways of specifying the nodes:
result5 <- ltmle(data, Anodes=c(3, 7), Lnodes=c("L1a", "L1b", "L2a", "L2b"), Ynodes=
 grep("^Y", names(data)), abar=c(1, 0), SL.library=NULL, estimate.time=FALSE, survivalOutcome=FALSE)
summary(result5)

result5a <- ltmle(data, Anodes=c(3, 7), Lnodes=c("L1a", "L1b", "L2a", "L2b"), 
 Ynodes=grep("^Y", names(data)), abar=c(1, 0), SL.library=NULL, estimate.time=FALSE, 
 survivalOutcome=FALSE, gform=c("A1 ~ gender", "A2 ~ age"))
summary(result5a)

#Usually you would specify a Qform for all of the Lnodes and Ynodes but in this case 
# L1a, L1b, Y1 is a "block" of L/Y nodes not separated by Anodes or Cnodes (the same is true for 
# L2a, L2b, Y2). Only one regression is required at the first L/Y node in a block. You can pass 
# regression formulas for the other L/Y nodes, but they'll be ignored.
result5b <- ltmle(data, Anodes=c(3, 7), Lnodes=c("L1a", "L1b", "L2a", "L2b"), 
 Ynodes=grep("^Y", names(data)), abar=c(1, 0), estimate.time=FALSE, survivalOutcome=FALSE, 
 gform=c("A1 ~ gender", "A2 ~ age"), Qform=c(L1a="Q.kplus1 ~ 1", L2a="Q.kplus1 ~ 1"))
summary(result5b)


#Gives the same result but prints a message saying some regression formulas will be dropped:
result5c <- ltmle(data, Anodes=c(3, 7), Lnodes=c("L1a", "L1b", "L2a", "L2b"), 
 Ynodes=grep("^Y", names(data)), abar=c(1, 0), estimate.time=FALSE, survivalOutcome=FALSE, 
 gform=c("A1 ~ gender", "A2 ~ age"), Qform=c(L1a="Q.kplus1 ~ 1", L1b="Q.klus1~A1", 
 Y1="Q.kplus1~L1a", L2a="Q.kplus1 ~ 1", L2b="Q.klus1~A1", Y2="Q.kplus1~A2 + gender"))
summary(result5c)

#If there were a Anode or Cnode between L1b and Y1, Y1 would also need a Q regression formula


# Example 6: MSM
# Given data over 3 time points where A switches to 1 once and then stays 1. We want to know
# how death varies as a function of time and an indicator of whether a patient's intended
# regime was to switch before time.
data(sampleDataForLtmleMSM)
Anodes <- grep("^A", names(sampleDataForLtmleMSM$data))
Lnodes <- c("CD4_1", "CD4_2")
Ynodes <- grep("^Y", names(sampleDataForLtmleMSM$data))

result6 <- ltmleMSM(sampleDataForLtmleMSM$data, Anodes=Anodes, Lnodes=Lnodes, Ynodes=Ynodes, 
                   survivalOutcome=TRUE,
                   regimes=sampleDataForLtmleMSM$regimes, 
                   summary.measures=sampleDataForLtmleMSM$summary.measures, final.Ynodes=Ynodes, 
                   working.msm="Y ~ time + I(pmax(time - switch.time, 0))", estimate.time=FALSE)
print(summary(result6))


# Example 6.1: regimes can also be specified as a list of rule functions

regimesList <- list(function(row) c(1,1,1),
                     function(row) c(0,1,1),
                     function(row) c(0,0,1),
                     function(row) c(0,0,0))
result.regList <- ltmleMSM(sampleDataForLtmleMSM$data, Anodes=Anodes, Lnodes=Lnodes, Ynodes=Ynodes, 
                   survivalOutcome=TRUE, regimes=regimesList, 
                   summary.measures=sampleDataForLtmleMSM$summary.measures, final.Ynodes=Ynodes, 
                   working.msm="Y ~ time + I(pmax(time - switch.time, 0))", estimate.time=FALSE)
# This should be the same as the above result
print(summary(result.regList))

Run the code above in your browser using DataLab