Using the general linear regression equation, the observation-specific value that would be expected given the model is generated for every observation in the dataset generated thus far. We could stop here, but this would create a perfect fit for the node, which is unrealistic. Instead, we add an error term by taking one sample of a normal distribution for each observation with mean zero and standard deviation error. This error term is then added to the predicted mean.
Formal Description:
Formally, the data generation can be described as:
$$Y \sim \texttt{intercept} + \texttt{parents}_1 \cdot \texttt{betas}_1 + ... + \texttt{parents}_n \cdot \texttt{betas}_n+ N(0, \texttt{error}),$$
where \(N(0, \texttt{error})\) denotes the normal distribution with mean 0 and a standard deviation of error and \(n\) is the number of parents (length(parents)).
For example, given intercept=-15, parents=c("A", "B"), betas=c(0.2, 1.3) and error=2 the data generation process is defined as:
$$Y \sim -15 + A \cdot 0.2 + B \cdot 1.3 + N(0, 2).$$
When using a link other than "identity", the procedure is equivalent, except that the link function is applied to the linear predictor before adding the random error term. For example, when using link="log", \(exp(-15 + A \cdot 0.2 + B \cdot 1.3) + N(0, 2)\) is used instead.
Random Effects and Random Slopes:
This function also allows users to include arbitrary amounts of random slopes and random effects using the formula argument. If this is done, the formula, and data arguments are passed to the variables of the same name in the makeLmer function of the simr package. The fixef argument of that function will be passed the numeric vector c(intercept, betas) and the VarCorr argument receives the var_corr argument as input. If used as a node type in a DAG, all of this is taken care of behind the scenes. Users can simply use the regular enhanced formula interface of the node function to define these formula terms, as shown in detail in the formula vignette (vignette(topic="v_using_formulas", package="simDAG")). Please consult that vignette for examples. Also, please note that inclusion of random effects or random slopes usually results in significantly longer computation times.