Estimates a network structure through node-wise regression models, where each regression is selected via an information-criterion–based stepwise procedure. The selected regression coefficients are subsequently combined into partial correlations to form the final network.
neighborhood_net(
data = NULL,
ns = NULL,
mat = NULL,
n_calc = "individual",
ic_type = "bic",
ordered = FALSE,
pcor_merge_rule = "and",
missing_handling = "two-step-em",
nimp = 20,
imp_method = "pmm",
...
)A list with the following elements:
Partial correlation matrix estimated from the node-wise regressions.
Matrix of regression coefficients from the final regression models.
Sample sizes used for each variable in the node-wise regressions.
List of settings used in the network estimation.
Optional raw data matrix or data frame containing the variables
to be included in the network. May include missing values. If data is not
provided (NULL), a covariance or correlation matrix must be supplied in mat.
Optional numeric sample size specification. Can be a single value
(same sample size is used for all regressions) or a vector (e.g., variable-wise sample
sizes). When data is provided and ns is NULL, sample sizes are derived
automatically from data. When mat is supplied instead of raw data,
ns must be provided and should reflect the sample size underlying mat.
Optional covariance or correlation matrix for the variables to be
included in the network. Used only when data is NULL. If both data and
mat are supplied, mat is ignored. When mat is used, ns must also be
provided.
Character string specifying how per-variable sample sizes for
node-wise regression models are computed when ns is not supplied. If ns
is provided, its values are used directly and n_calc is ignored. Possible
values are:
"individual"For each variable, uses the number of non-missing observations for that variable.
"average"Computes the average number of non-missing observations across all variables and uses this average as the sample size for every variable.
"max"Computes the maximum number of non-missing observations across all variables and uses this maximum as the sample size for every variable.
"total"Uses the total number of rows in data as the sample size
for every variable.
Type of information criterion to compute for model selection in
the node-wise regression models. Options are bic (default), aic, aicc.
Logical vector indicating whether each variable in data
should be treated as ordered categorical. Only used when data is provided.
If a single logical value is supplied, it is recycled to all variables.
Character string specifying how regression weights from the node-wise models are merged into partial correlations. Possible values are:
"and"Estimates a partial correlation only if the regression weights in both directions (e.g., from node 1 to 2 and from node 2 to 1) are non-zero in the final models.
"or"Uses the available regression weight from one direction as the partial correlation if the corresponding regression in the other direction is not included in the final model.
Character string specifying how correlations are
estimated from the data input in the presence of missing values. Possible
values are:
"two-step-em"Uses a classical EM algorithm to estimate the
correlation matrix from data.
"stacked-mi"Uses stacked multiple imputation to estimate the
correlation matrix from data.
"pairwise"Uses pairwise deletion to compute correlations from
data.
"listwise"Uses listwise deletion to compute correlations from
data.
Number of imputations (default: 20) to be used when
missing_handling = "stacked-mi".
Character string specifying the imputation method to be
used when missing_handling = "stacked-mi" (default: "pmm" - predictive
mean matching).
Further arguments passed to internal functions.
This function estimates a network structure using neighborhood selection guided by information criteria.
Simulations by williams.2019;textualmantar indicated that using the "and" rule for merging regression weights tends to yield more accurate partial correlation estimates than the "or" rule.
The argument ic_type specifies which information criterion is computed.
All criteria are computed based on the log-likelihood of the maximum
likelihood estimated regression model, where the residual variance
determines the likelihood. The following options are available:
"aic":Akaike Information Criterion akaike.1974mantar; defined as AIC = -2 + 2k, where \(\ell\) is the log-likelihood of the model and \(k\) is the number of estimated parameters (including the intercept).
"bic":Bayesian Information Criterion schwarz.1978mantar; defined as BIC = -2 + k (n), where \(\ell\) is the log-likelihood of the model, \(k\) is the number of estimated parameters (including the intercept) and \(n\) is the sample size.
"aicc":Corrected Akaike Information Criterion hurvich.1989mantar; particularly useful in small samples where AIC tends to be biased. Defined as AIC_c = AIC + 2k(k+1)n - k - 1, where \(k\) is the number of estimated parameters (including the intercept) and \(n\) is the sample size.
Missing Handling
To handle missing data, the function offers two approaches: a two-step expectation-maximization (EM) algorithm and stacked multiple imputation. According to simulations by nehler.2024;textualmantar, stacked multiple imputation performs reliably across a range of sample sizes. In contrast, the two-step EM algorithm provides accurate results primarily when the sample size is large relative to the amount of missingness and network complexity - but may still be preferred in such cases due to its much faster runtime.
Currently, the function only supports variables that are directly included in the network analysis; auxiliary variables for missing handling are not yet supported. During imputation, all variables are imputed by default using predictive mean matching @see e.g., @vanbuuren.2018mantar, with all other variables in the data set serving as predictors.
# Estimate network from full data set
# Using Akaike information criterion
result <- neighborhood_net(data = mantar_dummy_full_cont,
ic_type = "aic")
# View estimated partial correlations
result$pcor
# Estimate network for data set with missings
# Using Bayesian Information Criterion, individual sample sizes, and two-step EM
result_mis <- neighborhood_net(data = mantar_dummy_mis_cont,
n_calc = "individual",
missing_handling = "two-step-em",
ic_type = "bic")
# View estimated partial correlations
result_mis$pcor
Run the code above in your browser using DataLab