node: Create Node Object(s)

Description

This function provides a convenient way to define a node and its distribution in a time-varying format without unnecessary code repetition. The node distribution is allowed to vary as a function of time (t), with subsetting of the past nodes accomplished via NodeName[t]. Intended for use in conjunction with functions set.DAG, a DAG object constructor, and add.action, an action (intervention) constructor.

Usage

node(name, t, distr, EFU, order, ..., params = list())

Arguments

name

Character node name, for time-dependent nodes the names will be automatically expanded to a scheme "name_t" for each t provided specified

Node time-point(s). Allows specification of several time-points when t is a vector of positive integers, in which case the output will consist of a named list of length(t) nodes, corresponding to each value in t.

distr

Character name of the node distribution, can be a standard distribution R function, s.a. rnorm, rbinom, runif or user defined. The function must accept a named argument "n" to specify the total sample size. Distributional parameters (arguments) must be pa

EFU

End-Of-Follow up, only applies to Bernoulli nodes, when TRUE this node becomes an indicator for the end of follow-up (censoring, end of study, death, etc). When simulated variable with this node distribution evaluates to 1, subsequent nodes with higher

order

An optional integer parameter specifying the order in which these nodes will be sampled. The value of order has to start at 1 and be unique for each new node, can be specified as a range / vector and has to be of the same length as the argument t

...

Named arguments specifying distribution parameters that are accepted by the distr function. The parameters can be R expressions that are themselves formulas of the past node names.

params

A list of additional named parameters to be passed on to the distr function. The parameters have to be either constants or character strings of R expressions of the past node names.

Value

A list containing node object(s) (expanded to several nodes if t is an integer vector of length > 1)

Details

The combination of a generic name name and time point t must be unique in the sense that no other user-specified input node can result in the same combination of name and time point t. In other words, the combination of name and t must uniquely identify each node in the DAG. The user should use the same name to identify measurements of the same attribute (e.g. 'A1c') at various time points.

All nodes indexed by the same time point t value must have consecutive order values. The order values of all nodes indexed by the same t value must have their order values: 1) strictly greater than the order values of all nodes indexed by a strictly lower t value and 2) strictly lower than the order values of all nodes indexed by a strictly higher t value. All nodes of a DAG must have consecutive order values starting at one. The collection of unique t values of all nodes of a DAG must be consecutive values starting at 0.

All node calls that share the same generic name name must also share the same EFU value (if any is specified in at least one of them). A value of TRUE for the EFU indicates that if a simulated value for a measurement of the attribute represented by node is 1 then all the following nodes with that measurement (in terms of higher t values) in the DAG will be unobserved (i.e., their simulated value will be set to NA).

Each formula of an input node is an evaluable R expression. All formulas are delayed in the evaluation until the simulation time. Formulas can refer to standard or user-specified R functions that must only apply to the values of parent nodes, i.e. a subset of the node(s) with an order value strictly lower than that of the node characterized by the formula. Formulas must reference the parent nodes with unique name identifiers, employing the square bracket vector subsetting name[t] for referencing a parent node at a particular time point t (if any time-points were specified). The square bracket notation is used to index a generic name with the relevant time point as illustrated in the examples. When an input node is used to define several nodes (i.e., several measurement of the same attribute, t=0:5), the formula(s) specified in that node can apply to each node indexed by a given time point denoted by t. This generic expression t can then be referenced within a formula to simultaneously identify a different set of parent nodes for each time point as illustrated below. Note that the parents of each node represented by a given node object are implicitly defined by the nodes referenced in formulas of that node call.

Distribution parameters (mean, probs, sd, unifmin and unifmax) are passed down with delayed evaluation, to force immediate evaluation of any variable inside these expressions wrap the variable with .() function, see Example 2 for .(t_end).

Examples

Run this code

#---------------------------------------------------------------------------------------
# EXAMPLE 1A: Define some Bernoulli nodes, survival outcome Y and put it together in a 
# DAG object
#---------------------------------------------------------------------------------------
W1 <- node(name = "W1", distr = "rbern", 
	prob = plogis(-0.5), order = 1)
W2 <- node(name = "W2", distr = "rbern", 
	prob = plogis(-0.5 + 0.5 * W1), order = 2)
A <- node(name = "A", distr = "rbern", 
	prob = plogis(-0.5 - 0.3 * W1 - 0.3 * W2), order = 3)
Y <- node(name = "Y", distr = "rbern", 
	prob = plogis(-0.1 + 1.2 * A + 0.3 * W1 + 0.3 * W2), order = 4)
D1A <- set.DAG(c(W1,W2,A,Y))

#---------------------------------------------------------------------------------------
# EXAMPLE 1B: Same as 1A using +node interface and no order argument
#---------------------------------------------------------------------------------------
D1B <- DAG.empty()
D1B <- D1B + node(name = "W1", distr = "rbern", 
	prob = plogis(-0.5))
D1B <- D1B + node(name = "W2", distr = "rbern", 
	prob = plogis(-0.5 + 0.5 * W1))
D1B <- D1B + node(name = "A", distr = "rbern", 
	prob = plogis(-0.5 - 0.3 * W1 - 0.3 * W2))
D1B <- D1B + node(name = "Y", distr = "rbern", 
	prob = plogis(-0.1 + 1.2 * A + 0.3 * W1 + 0.3 * W2))
D1B <- set.DAG(D1B)

#---------------------------------------------------------------------------------------
# EXAMPLE 1C: Same as 1A and 1B using add.nodes interface and no order argument
#---------------------------------------------------------------------------------------
D1C <- DAG.empty()
D1C <- add.nodes(D1C, node(name = "W1", distr = "rbern", 
	prob = plogis(-0.5)))
D1C <- add.nodes(D1C, node(name = "W2", distr = "rbern", 
	prob = plogis(-0.5 + 0.5 * W1)))
D1C <- add.nodes(D1C, node(name = "A", distr = "rbern", 
	prob = plogis(-0.5 - 0.3 * W1 - 0.3 * W2)))
D1C <- add.nodes(D1C, node(name = "Y", distr = "rbern", 
	prob = plogis(-0.1 + 1.2 * A + 0.3 * W1 + 0.3 * W2)))
D1C <- set.DAG(D1C)

#---------------------------------------------------------------------------------------
# EXAMPLE 1D: Add a uniformly distributed node and redefine outcome Y as categorical
#---------------------------------------------------------------------------------------
D_unif <- DAG.empty()
D_unif <- D_unif + 
node("W1", distr = "rbern", prob = plogis(-0.5)) + 
node("W2", distr = "rbern", prob = plogis(-0.5 + 0.5 * W1)) + 
node("W3", distr = "runif", min = plogis(-0.5 + 0.7 * W1 + 0.3 * W2), max = 10) + 
node("An", distr = "rbern", prob = plogis(-0.5 - 0.3 * W1 - 0.3 * W2 - 0.2 * sin(W3)))
# Categorical syntax 1 (probabilities as values):
D_cat_1 <- D_unif + node("Y", distr = "rcategor", probs = {0.3; 0.4})
D_cat_1 <- set.DAG(D_cat_1)
# Categorical syntax 2 (probabilities as formulas):
D_cat_2 <- D_unif + 
node("Y", distr = "rcategor", 
	probs={plogis(-0.1 + 1.2 * An + 0.3 * W1 + 0.3 * W2 + 0.2 * cos(W3)); 
			plogis(-0.5 + 0.7 * W1)})
D_cat_2 <- set.DAG(D_cat_2)

#---------------------------------------------------------------------------------------
# EXAMPLE 2A: Define Bernoulli nodes using R rbinom() function, defining prob argument
# for L2 as a function of node L1
#---------------------------------------------------------------------------------------
D <- DAG.empty()
D <- D + 
node("L1", t = 0, distr = "rbinom", 
	prob = 0.05, size = 1) + 
node("L2", t = 0, distr = "rbinom", 
	prob = ifelse(L1[0] == 1, 0.5, 0.1), size = 1)
D <- set.DAG(D)

#---------------------------------------------------------------------------------------
# EXAMPLE 2B: Equivalent to 2A, passing argument size to rbinom inside a named list
# params
#---------------------------------------------------------------------------------------
D <- DAG.empty()
D <- D + 
node("L1", t = 0, distr = "rbinom", 
	prob = 0.05, params = list(size = 1)) + 
node("L2", t = 0, distr = "rbinom", 
	prob = ifelse(L1[0] == 1,0.5,0.1), params = list(size = 1))
D <- set.DAG(D)

#---------------------------------------------------------------------------------------
# EXAMPLE 2C: Equivalent to 2A and 2B, define Bernoulli nodes using a wrapper "rbern"
#---------------------------------------------------------------------------------------
D <- DAG.empty()
D <- D + 
node("L1", t = 0, distr = "rbern", prob = 0.05) + 
node("L2", t = 0, distr = "rbern", prob = ifelse(L1[0] == 1, 0.5, 0.1))
D <- set.DAG(D)

#---------------------------------------------------------------------------------------
# EXAMPLE 3: Define node with normal distribution using rnorm() R function
#---------------------------------------------------------------------------------------
D <- DAG.empty()
D <- D + node("L2", t = 0, distr = "rnorm", mean = 10, sd = 5)
D <- set.DAG(D)

#---------------------------------------------------------------------------------------
# EXAMPLE 4: Define 34 Bernoulli nodes, or 2 Bernoulli nodes over 17 time points,
# prob argument contains .() expression that is immediately evaluated in the calling 
# environment (.(t_end) will evaluate to 16)
#---------------------------------------------------------------------------------------
t_end <- 16
D <- DAG.empty()
D <- D + 
node("L2", t = 0:t_end, distr = "rbinom", 
	prob = ifelse(t == .(t_end), 0.5, 0.1), size = 1) + 
node("L1", t = 0:t_end, distr = "rbinom", 
	prob = ifelse(L2[0] == 1, 0.5, 0.1), size = 1)
D <- set.DAG(D)

#---------------------------------------------------------------------------------------
# EXAMPLE 5: Defining new distribution function 'rbern', defining and passing a custom 
# vectorized node function 'customfun'
#---------------------------------------------------------------------------------------
rbern <- function(n, prob) { # defining a bernoulli wrapper based on R rbinom function
  rbinom(n = n, prob = prob, size = 1)
}
customfun <- function(arg, lambda) {
  res <- ifelse(arg == 1, lambda, 0.1)
  res
}
D <- DAG.empty()
D <- D + 
node("W1", distr = "rbern", prob = 0.05) + 
node("W2", distr = "rbern", prob = customfun(W1, 0.5)) + 
node("W3", distr = "rbern", prob = ifelse(W1 == 1, 0.5, 0.1))
D1d <- set.DAG(D, vecfun = c("customfun"))
sim1d <- simobs(D1d, n = 200, rndseed = 1)

Run the code above in your browser using DataLab