Learn R Programming

simcausal (version 0.1)

node.depr: Create Node Object(s) (Deprecated)

Description

This function provides a convenient way to define a node and its distribution in a time-varying format without unnecessary code repetition. The node distribution is allowed to vary as a function of time (t), with subsetting of the past nodes accomplished via NodeName[t]. Intended for use in conjunction with functions set.DAG, a DAG object constructor, and add.action, an action (intervention) constructor.

Usage

node.depr(name, t, distr, prob, mean, sd, probs, unifmin, unifmax, EFU, order)

Arguments

name
Character node name, for time-dependent nodes the names will be automatically expanded to a scheme "name_t" for each t provided specified
t
Node time-point(s). Allows specification of several time-points when t is a vector of positive integers, in which case the output will consist of a named list of length(t) nodes, corresponding to each value in t.
distr
Character name of the node distribution, currently supporting: "const" - constant value, "Bern" - Bernoulli 0/1, "unif" - continuous uniform, "cat" - categorical and "norm" - normal.
prob
R expression for specifying the probability of success for Bernoulli.
mean
R expression for specifying the mean of the normal distribution. The expression can reference any node names with order less than current node order, for referencing time-dependent nodes use syntax TDVar[t]. When distr="Bern", mean is the pro
sd
R expression for the standard deviation of the normal distribution. Evaluated under the same rules as argument mean above.
probs
A series of R expressions each representing the probability for a value of a categorical node, the expressions will be only evaluated during simulation. The expressions (probabilities) have to be separated by semicolons (;) and the entire argument has to
unifmin
Minimum value for the uniform random variable. Evaluated under the same rules as argument mean above.
unifmax
Maximum value for the uniform random variable. Evaluated under the same rules as argument mean above.
EFU
End-Of-Followup, only applies to Bernoulli nodes, when TRUE this node becomes an indicator for the end of follow-up (censoring, end of study, death, etc). When simulated variable with this node distribution evaluates to 1, subsequent nodes with higher
order
Integer, a required argument when specifying new DAG nodes (but not when specifying actions on existing nodes). The value of order has to start at 1 and be unique for each new node, can be specified as a range/vector and has to be of the same length as th

Value

  • A list containing node object(s) (expanded to several nodes if t is an integer vector of length > 1)

Details

The combination of a generic name name and time point t must be unique in the sense that no other user-specified input node can result in the same combination of name and time point t. In other words, the combination of name and t must uniquely identify each node in the DAG. The user should use the same name to identify measurements of the same attribute (e.g. 'A1c') at various time points.

All nodes indexed by the same time point t value must have consecutive order values. The order values of all nodes indexed by the same t value must have their order values: 1) strictly greater than the order values of all nodes indexed by a strictly lower t value and 2) strictly lower than the order values of all nodes indexed by a strictly higher t value. All nodes of a DAG must have consecutive order values starting at one. The collection of unique t values of all nodes of a DAG must be consecutive values starting at 0.

All node calls that share the same generic name name must also share the same EFU value (if any is specified in at least one of them). A value of TRUE for the EFU indicates that if a simulated value for a measurement of the attribute represented by node is 1 then all the following nodes with that measurement (in terms of higher t values) in the DAG will be unobserved (i.e., their simulated value will be set to NA).

Each formula of an input node is an evaluable R expression. All farmulas are delayed in the evaluation until the simulation time. Formulas can refer to standard or user-specified R functions that must only apply to the values of parent nodes, i.e. a subset of the node(s) with an order value strictly lower than that of the node characterized by the formula. Formulas must reference the parent nodes with unique name identifiers, employing the square bracket vector subsetting name[t] for referencing a parent node at a particular time point t (if any time-points were specified). The square bracket notation is used to index a generic name with the relevant time point as illustrated in the examples. When an input node is used to define several nodes (i.e., several measurement of the same attribute, t=0:5), the formula(s) specified in that node can apply to each node indexed by a given time point denoted by t. This generic expression t can then be referenced within a formula to simultaneously identify a different set of parent nodes for each time point as illustrated below. Note that the parents of each node represented by a given node object are implicitly defined by the nodes referenced in formulas of that node call.

Distribution parameters (mean, probs, sd, unifmin and unifmax) are passed down with delayed evaluation, to force immediate evaluation of any variable inside these expressions wrap the variable with .() function, see Example 2 for .(t_end).

Examples

Run this code
#---------------------------------------------------------------------------------------
# EXAMPLE 1A: Define some bernoulli nodes, W1,W2,W3, treatment A, outcome Y and put
# together in a dag
#---------------------------------------------------------------------------------------
W1 <- node.depr(name="W1", distr="Bern", prob=plogis(-0.5), order=1)
W2 <- node.depr(name="W2", distr="Bern", prob=plogis(-0.5 + 0.5*W1), order=2)
A <- node.depr(name="A", distr="Bern", prob=plogis(-0.5 - 0.3*W1 - 0.3*W2), order=3)
Y <- node.depr(name="Y", distr="Bern", prob=plogis(-0.1 + 1.2*A + 0.3*W1 + 0.3*W2), order=4,
              EFU=TRUE)
D1A <- set.DAG(c(W1,W2,A,Y))

#---------------------------------------------------------------------------------------
# EXAMPLE 1B: Same as 1a with alternative "+" syntax
#---------------------------------------------------------------------------------------
D1B <- DAG.empty()
D1B <- D1B + node.depr(name="W1", distr="Bern", prob=plogis(-0.5), order=1)
D1B <- D1B + node.depr(name="W2", distr="Bern", prob=plogis(-0.5 + 0.5*W1), order=2)
D1B <- D1B + node.depr(name="A", distr="Bern", prob=plogis(-0.5 - 0.3*W1 - 0.3*W2), order=3)
D1B <- D1B + node.depr(name="Y", distr="Bern", prob=plogis(-0.1 + 1.2*A + 0.3*W1 + 0.3*W2),
                       order=4, EFU=TRUE)
D1B <- set.DAG(D1B)

#---------------------------------------------------------------------------------------
# EXAMPLE 1C: Add a uniformly distributed node and redefine outcome Y as categorical
#---------------------------------------------------------------------------------------
D_unif <- DAG.empty()
D_unif <- D_unif + node.depr("W1", distr="Bern", prob=plogis(-0.5), order=1)
D_unif <- D_unif + node.depr("W2", distr="Bern", prob=plogis(-0.5 + 0.5*W1), order=2)
D_unif <- D_unif + node.depr("W3", distr="unif", unifmin=plogis(-0.5 + 0.7*W1 + 0.3*W2),
                             unifmax=10, order=3)
D_unif <- D_unif + node.depr("Anode", distr="Bern",
                             prob=plogis(-0.5 - 0.3*W1 - 0.3*W2 - 0.2*sin(W3)), order=4)
# Categorical syntax 1 (probabilities as values)
D_cat_1 <- D_unif + node.depr("Y", distr="cat", probs={0.3;0.4}, order=5)
# Categorical syntax 2 (probabilities as formulas)
D_cat_2 <- D_unif + node.depr("Y", distr="cat", probs={plogis(1.2*Anode + 0.5*cos(W3));
                             plogis(-0.5 + 0.7*W1)}, order=5)
D_cat_1 <- set.DAG(D_cat_1)
D_cat_2 <- set.DAG(D_cat_2)

#---------------------------------------------------------------------------------------
# EXAMPLE 2: Define time varying nodes for time points 0 to 16
#---------------------------------------------------------------------------------------
t_end <- 16
DTV <- DAG.empty()
DTV <- DTV + node.depr(name="L2", t=0, distr="Bern", prob=0.05, order=1)
DTV <- DTV + node.depr(name="L1", t=0, distr="Bern", prob=ifelse(L2[0]==1,0.5,0.1), order=2)
DTV <- DTV + node.depr(name="A1", t=0, distr="Bern",
                 prob= ifelse(L1[0]==1 & L2[0]==0, 0.5, ifelse(L1[0]==0 & L2[0]==0, 0.1,
                       ifelse(L1[0]==1 & L2[0]==1, 0.9, 0.5))), order=3)
DTV <- DTV + node.depr(name="A2", t=0, distr="Bern",
	prob=0, order=4)
DTV <- DTV + node.depr(name="Y",  t=0, distr="Bern",
	prob=plogis(-6.5 + L1[0] + 4*L2[0] + 0.05*I(L2[0]==0)), order=5, EFU=TRUE)
# Distribution parameters (prob argument for bernoulli) are passed by delayed evaluation,
# to force immediate evaluation of t_end or any other variable inside the node formula
# put it inside .(), e.g., .(t_end):
DTV <- DTV + node.depr(name="L2", t=1:t_end, distr="Bern",
	                prob=ifelse(A1[t-1]==1, 0.1, ifelse(L2[t-1]==1, 0.9,
                 min(1,0.1 + t/.(t_end)))),
                 order=6+4*(0:(t_end-1)))
DTV <- DTV + node.depr(name="A1", t=1:t_end, distr="Bern",
	                                prob=ifelse(A1[t-1]==1, 1, ifelse(L1[0]==1 & L2[0]==0, 0.3,
                                   ifelse(L1[0]==0 & L2[0]==0, 0.1,
                                   ifelse(L1[0]==1 & L2[0]==1, 0.7, 0.5)))),
                                  order=7+4*(0:(t_end-1)))
DTV <- DTV + node.depr(name="A2", t=1:t_end, distr="Bern",
	prob=0, order=8+4*(0:(t_end-1)))
DTV <- DTV + node.depr(name="Y", t=1:t_end, distr="Bern",
	                 prob=plogis(-6.5 + L1[0] + 4*L2[t] + 0.05*sum(I(L2[0:t]==rep(0,(t+1))))),
                  order=9+4*(0:(t_end-1)), EFU=TRUE)
DTV <- set.DAG(DTV)

Run the code above in your browser using DataLab