Uses the output from questionnaire_gen to generate linear regression coefficients.
beta_gen(
data,
MC = FALSE,
MC_replications = 100,
CI = c(0.005, 0.995),
output_cov = FALSE,
rename_to_q = FALSE,
verbose = TRUE
)By default, this function will output a vector of the regression
coefficients, including intercept. If MC == TRUE, the output will
instead be a matrix comparing the true regression coefficients obtained
from the covariance matrix with expected values obtained from a Monte Carlo
simulation, complete with 99% confidence interval.
If output_cov = TRUE, the output will be a list with two elements:
the first one, betas, will contain the same output described in the
previous paragraph. The second one, called vcov_YXW, contains
the covariance matrix of the regression coefficients.
output from the questionnaire_gen function with
full_output = TRUE and theta = TRUE
if TRUE, performs Monte Carlo simulation to estimate
regression coefficients
for MC = TRUE, this represents the number of
Monte Carlo subsamples calculated
confidence interval for Monte Carlo simulations
if TRUE, will also output the covariance matrix of
YXW
if TRUE, renames the variables from "x" and "w" to
"q"
if `FALSE`, output messages will be suppressed (useful for simulations). Defaults to `TRUE`
This function was primarily conceived as a subfunction of
questionnaire_gen, when family = "gaussian", theta =
TRUE, and full_output = TRUE. However, it can also be directly
called by the user so they can perform further analysis.
This function primarily calculates the true regression coefficients (\(\beta\)) for the linear influence of the background questionnaire variables in \(\theta\). From a statistical perspective, this relationship can be modeled as follows, where \(E(\theta | \boldsymbol{X}, \boldsymbol{W})\) is the expectation of \(\theta\) given \(\boldsymbol{X} = \{X_1, \ldots, X_P\}\) and \(\boldsymbol{W} = \{W_1, \ldots, W_Q\}\):
$$E(\theta | \boldsymbol{X}, \boldsymbol{W}) = \beta_0 + \sum_{p = 1}^P \beta_p X_p + \sum_{q = 1}^Q \beta_{P + q} W_q$$
The regression coefficients are calculated using the true covariance matrix
either provided by the user upon calling of questionnaire_gen or
randomly generated by that function if none was provided. In any case, that
matrix is not sample-dependent, though it should be similar to the one
observed in the generated data (especially for larger samples). One
convenient way to check for this similarity is by running the function with
MC = TRUE, which will generate a numeric estimate; the
MC_replications argument can be then increased to improve the
estimates at a often-noticeable cost in processing time. If MC =
FALSE, the MC_replications will have no effect on the results. In
any case, each subsample will always have the same size as the original
sample.
If the background questionnaire contains categorical variables (\(W\)), the original covariance matrix cannot be used because it contains the covariances involving \(Z ~ N(0, 1)\), which is the random variable that gets categorized into \(W\). The case where \(W\) is always binomial is trivial, but if at least one \(W\) has more than two categories, the structure of the covariance matrix changes drastically. In this case, this function recalculates all covariances between \(\theta\), \(X\) and each category of \(W\) using some auxiliary internal functions which rely on the appropriate distribution (either multivariate normal or truncated normal). To avoid multicollinearity, the first categories of each \(W\) are dropped before the regression coefficients are calculated.
questionnaire_gen
data <- questionnaire_gen(100, family="gaussian", theta = TRUE,
full_output = TRUE, n_X = 2, n_W = list(2, 2, 4))
beta_gen(data, MC = TRUE)
Run the code above in your browser using DataLab