uncertainty_fun: Uncertainty and sensitivity analysis of node and path risk

Description

Runs a full variance-based uncertainty and sensitivity analysis for node risk scores using the results returned by all_paths_fun() and the functions provided by the sensobol package (Puy et al. 2022).

Usage

uncertainty_fun(
  all_paths_out,
  N,
  order,
  risk_form = c("additive", "power_mean")
)

Value

A named list with:

nodes: A data.table of node results.
paths: A data.table of path results.

Arguments

all_paths_out: A list produced by all_paths_fun() with elements nodes and paths. nodes must contain columns name, cyclomatic_complexity, indeg, btw; paths must contain path_id, path_nodes, path_str, and hops.
N: Integer. Base sample size used for Sobol' matrices.
order: Passed to sensobol::sobol_matrices() and sensobol::sobol_indices() to control which Sobol indices are computed (e.g., first/total/second order), depending on your implementation.
risk_form: Character. Risk definition used in the uncertainty/sensitivity analysis. One of "additive" or "power_mean". Default "additive".

Details

To assess how sensitive the risk scores are to the choice of weighting parameters, this function explores many alternative combinations of weights and (optionally) a power-mean parameter, and examines how resulting risk scores vary.

When risk_form = "additive", uncertainty is induced by sampling weight triplets $(\alpha, \beta, \gamma)$ under the constraint $\alpha + \beta + \gamma = 1$, representing different plausible balances between complexity, connectivity and centrality.

When risk_form = "power_mean", uncertainty is induced by sampling both the weights $(\alpha, \beta, \gamma)$ (renormalized to sum to 1) and a power parameter $p$ used in the node-risk definition: $$r = \left(\alpha\,\tilde{C}^{p} + \beta\,(\tilde{d}^{\mathrm{in}})^{p} + \gamma\,\tilde{b}^{p}\right)^{1/p}\,.$$

For each node, risk scores are repeatedly recalculated using the sampled parameter combinations, producing a distribution of possible outcomes. This distribution is then used to quantify uncertainty in the risk scores and compute Sobol' sensitivity indices for each sampled parameter.

Path-level uncertainty is obtained by propagating node-level uncertainty draws through the path aggregation function: $$P_k = 1 - \prod_{i=1}^{n_k} (1 - r_{k(v_i)})\,,$$ where $r_{k(v_i)}$ are node risks along path $k$.

All uncertainty metrics are computed from the first N Sobol draws (matrix A), while sensitivity indices use the full Sobol' design.

For more information about the uncertainty and sensitivity analysis and the output of this function, see the sensobol package (Puy et al. 2022).

The returned node table includes the following columns:

name: name of the node.
uncertainty_analysis: numeric vector giving the uncertainty draws in the node risk score.
sensitivity_analysis: object returned by sensobol::sobol_indices() (per-node).

The returned paths table includes:

path_id: path identifier.
path_str: sequence of function calls for each path.
hops: number of edges.
uncertainty_analysis: numeric vector giving the uncertainty draws in the path risk score.
gini_index: numeric vector giving the uncertainty draws in the gini index.
risk_trend: numeric vector giving the uncertainty draws in the risk trend.

References

Puy, A., Lo Piano, S., Saltelli, A., and Levin, S. A. (2022). sensobol: An R Package to Compute Variance-Based Sensitivity Indices. Journal of Statistical Software, 102(5), 1--37. doi:10.18637/jss.v102.i05

Examples

Run this code

# \donttest{
data(synthetic_graph)
out <- all_paths_fun(graph = synthetic_graph, alpha = 0.6, beta = 0.3,
                     gamma = 0.1, complexity_col = "cyclo")

# Additive risk (increase N to at least 2^10 for a proper UA/SA)
results1 <- uncertainty_fun(all_paths_out = out, N = 2^2, order = "first",
                            risk_form = "additive")

# Power-mean risk (increase N to at least 2^10 for a proper UA/SA)
results2 <- uncertainty_fun(all_paths_out = out, N = 2^2, order = "first",
                            risk_form = "power_mean")

results1$nodes
results1$paths
# }

Run the code above in your browser using DataLab