
Last chance! 50% off unlimited learning
Sale ends in
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
You can install BGGM from github as follows.
# install.packages("devtools")
devtools::install_github("donaldRwilliams/BGGM")
Gaussian graphical models (GGM) allow for learning conditional
(in)dependence structures that are encoded by partial correlations.
Whereas there are several R
packages for classical methods (see
Kuismin and Sillanpää 2017, Table 1), there are only two that implement
a Bayesian approach (Leday and Richardson 2018; Mohammadi and Wit 2015).
These are exclusively focused on identifying the graphical structure.
The package BGGM not only contains novel Bayesian methods for this
purpose, but it also includes Bayesian methodology for extending
inference beyond identifying non-zero relations.
BGGM is built around two approaches for Bayesian
inference–estimation and hypothesis testing. This distinction is
arbitrary (see Rouder, Haaf, and Vandekerckhove 2018), but is used to
organize this work. The former focuses on the posterior distribution and
includes extensions to assess predictability (Haslbeck and Waldorp
2018), as well as methodology to compare partial correlations. The
latter includes methods for Bayesian hypothesis testing, in both
exploratory and confirmatory contexts, with the novel matrix-(F) prior
distribution (Mulder and Pericchi 2018). This allows for testing the
null hypothesis of conditional independence, as well as inequality and
equality constrained hypotheses. Further, there are several approaches
for comparing GGMs across any number of groups. The package also
includes a suite of options for model checking. Together, BGGM is a
comprehensive toolbox for Gaussian graphical modeling in R
.
There are two possibilities for estimating GGMs. The first is an analytic solution and the second samples from the posterior distribution (described in Williams 2018). Sampling is recommended because the samples are required for various functions in BGGM (e.g., posterior predictive checks and prediction).
Structure learning refers to determining which partial correlations are non-zero. This is implemented with:
# data (p = 5 for demonstrative purposes)
Y <- BGGM::bfi[,1:5]
# fit model
fit <- estimate(Y = Y,
analytic = FALSE,
iter = 1000)
# select graph
E <- select(fit_analytic, ci_width = 0.95)
# summary
summary(E)
# output
BGGM: Bayesian Gaussian Graphical Models
---
Type: Selected Graph (Analytic Solution)
Credible Interval: 95 %
Connectivity: 80 %
---
Call:
select.estimate(x = fit_analytic, ci_width = 0.95)
---
Selected:
Partial correlations
1 2 3 4 5
1 0.00 -0.24 -0.11 0.00 0.00
2 -0.24 0.00 0.29 0.16 0.16
3 -0.11 0.29 0.00 0.18 0.36
4 0.00 0.16 0.18 0.00 0.12
5 0.00 0.16 0.36 0.12 0.00
---
Adjacency
1 2 3 4 5
1 0 1 1 0 0
2 1 0 1 1 1
3 1 1 0 1 1
4 0 1 1 0 1
5 0 1 1 1 0
---
It is customary to plot the estimated structure. The implement ion for
plotting E
is described below, as the same call is also used for the
hypothesis testing methodology.
The partial correlations are plotted with:
# summarize the posterior distributions
fit_summary <- summary(fit, cred = 0.95)
# plot summary
fig_1a <- plot(fit_summary)
The object fit_summary
includes the partial correlation that have been
summarized with the posterior mean, standard deviation, and a given
credible interval. The object fig_1a
is a ggplot
(Wickham 2016),
which allows for further customization. This is possible with all
plot()
functions in BGGM. An example is provided below.
A central aspect of BGGM is to extend inference beyond the individual partial correlations. Assessing predictability provides a measure of “self-determination” (Haslbeck and Waldorp 2018), for example, how much variance is explained by the variables included in the model. To this end, BGGM provides several options to assess predictability . In this example, we compute Bayesian (R^2) (Gelman et al. 2019):
# bayes r2
r2 <- bayes_R2(fit)
fig_1b <- plot(r2,
type = "ridgeline")
# output
BGGM: Bayesian Gaussian Graphical Models
---
Metric: Bayes R2
Type: post.pred
Credible Interval: 0.95
---
Estimates:
Node Post.mean Post.sd Cred.lb Cred.ub
1 0.13 0.01 0.10 0.15
2 0.33 0.02 0.30 0.36
3 0.38 0.02 0.35 0.41
4 0.18 0.01 0.15 0.20
5 0.29 0.02 0.26 0.32
fig_1b
is made with the help of the ggridges package (Wilke 2018).
There is additional methodology that allows for comparing GGMs
(described in Williams et al. 2019). For the estimation based methods,
there are three possibilities, including ggm_compare_estimate()
,
assess_predictability()
and ggm_compare_ppc()
. We encourage user to
explore all of those functions. The following is based on the posterior
predictive distribution. In this example, we use data from a resilience
survey to compare GGMs between males and females (Briganti and Linkowski
2019).
# data
Ym <- subset(rsa, gender == "M",
select = - gender)
Yf <- subset(rsa, gender == "F",
select = - gender )
# predictive check
ppc <- ggm_compare_ppc(Ym, Yf)
fig_1c <- plot(ppc)
# summary
summary(ppc)
# output
BGGM: Bayesian Gaussian Graphical Models
---
Type: GGM Comparison (Global Predictive Check)
Posterior Samples: 5000
Group 1: 278
Group 2: 397
Variables (p): 33
Edges: 528
---
Call:
ggm_compare_ppc(Ym, Yf)
---
Estimates:
contrast KLD p_value
Y_g1 vs Y_g2 2.512792 0
---
note:
p_value = p(T(Y_rep) > T(Y)|Y)
KLD = (symmetric) Kullback-Leibler divergence
Note that this method can be used to compare any number of groups.
The following methods were described in Williams and Mulder (2019). Note
that each function has summary()
, print()
, and plot()
functions.
The implementation is similar to the estimation based methods, and thus
not included here.
The Bayes factor methods allow for gaining evidence for null effects, as well as for one-sided hypothesis testing.
# data
Y <- bfi[,1:5]
# fit model
fit <- explore(Y)
# select graph
E <- select(fit,
BF_cut = 3,
alternative = "greater")
The object E
includes the selected graphs for which there was evidence
for a positive effect and a null effect. This can be plotted with
plot(E)
. An example is provided below.
GGMs are typically data driven and thus inherently exploratory. Another key contribution of BGGM is extending hypothesis testing beyond exploratory and to confirmatory in GGMs (Williams and Mulder 2019). The former is essentially feeding the data to the functions in BGGM and seeing what comes back. In other words, there are no specific, hypothesized models under consideration. On the other hand, confirmatory hypothesis testing allows for comparing theoretical models or (actual) predictions. A researcher may expect, for example, that a set of partial correlations is larger than another set. This is tested with:
# define hypothesis
hypothesis <- c("(A1--A2, A1--A3) >
(A1--A4, A1--A5)")
# test inequality contraint
test_order <- confirm(Y = Y,
hypothesis = hypothesis,
prior_sd = 0.5, iter = 5000,
cores = 2)
# output
BGGM: Bayesian Gaussian Graphical Models
Type: Confirmatory Hypothesis Testing
---
Call:
confirm(Y = Y, hypothesis = hypothesis, prior_sd = 0.5, iter = 5000,
cores = 2)
---
Hypotheses:
H1 (A1--A2,A1--A3)>(A1--A4,A1--A5)
Hc 'not H1'
---
Posterior prob:
p(H1|Y) = 0
p(Hc|Y) = 1
---
Bayes factor matrix:
H1 Hc
H1 1 Inf
Hc 0 1
---
note: equal hypothesis prior probabilities
Note that A1--A2
denotes the partial correlation between variables A1
and A2. Any number of hypothesis can be tested. They just need to be
separated by a a semi-colon, e.g., hypothesis = c(A1--A2 > 0; A1--A2
= 0)
, which also demonstrates that it is possible to simultaneously
test both inequality (>
or <
) and equality (=
) restrictions.
The Bayes factor approach can also be used to compare GGMs (see Williams et al. 2019).
# compare
bf_comp <- ggm_compare_bf(Ym, Yf,
prior_sd = 0.20)
# select
sel <- select(bf_comp,
BF_cut = 3)
# figure 2
fig_2 <- plot(sel)
The results are provided in Figure 2. Note that confirmatory hypothesis testing is also possible. This uses the same arguments as those provided above.
Plotting both the estimate and hypothesis testing graphs uses the same
arguments. The following is an example of plotting a graph estimated
with explore
.
# resilience scale
Y <- subset(rsa, select = - gender)
# fit model
fit <- explore(Y)
# select graph
E <- select(fit,
BF_cut = 3,
alternative = "greater")
# plot graph
fig_3 <- plot(E, layout = "mds",
node_labels_color = "black",
node_groups = BGGM:::rsa_labels,
txt_size = 4, node_outer_size = 8,
node_inner_size = 6,
edge_multiplier = 5,
alpha = 0.3)$plt +
# remove legend name and set palette
scale_color_brewer(name = NULL,
palette = "Set1") +
# add title
ggtitle("Conditional Dependence Structure") +
# make title larger and add box around legend
theme(plot.title = element_text(size = 14),
legend.background = element_rect(color = "black"))
The methods in BGGM use custom samplers to estimate the partial correlations. There are several functions to monitor convergence. Additionally, there are a variety of methods to check the adequacy of the fitted model.
GGMs have a direct correspondence to multiple regression (Stephens 1998; Kwan 2014). Hence regression diagnostics can be used to evaluate the fitted model.
# data
Y <- bfi[,1:5]
# fit model
fit <- estimate(Y)
# plot
fig_4a <- diagnostics(fit, iter = 100)
The object fig_4a
contains one plot for each variable in model (in
this case five). One of those plots is included in Figure 4 (panel A).
In BGGM, posterior predictive checks are carried out with the R
package bayesplot (Gabry et al. 2019). Some internal code was
borrowed from brms to achieve consistent between packages .
fig_4b <- pp_check(fit,
type = "stat",
stat = "sd")
The object fig_4b
also contains one plot for each variable in model.
One of those plots is included in Figure 4 (panel B). Note that this
implementation is based on the internals of the pp_check()
function
from brms (Bürkner 2018), but adapted for GGMs. This ensures that
there is consistency between packages using bayesplot.
BGGM uses Gibbs samplers to estimate GGMs. The models are not terribly difficult to estimate, so convergence should typically not be an issue. To verify convergence, however, BGGM provide both trace and acf plots. An example of the former is implemented with:
fig_4c <- convergence(fit,
type = "trace",
param = "1--2")
This plot is in Figure 4 (panel C). Note that it is possible to check
the convergence of several parameters (e.g., param = c(1--2, 1--3)
).
There are several additional functions in BGGM not discussed in this
work. For example, there are a variety of options for predictability
(e.g., mean squared error), plotting capabilities for confirm()
, among
others. We encourage users to explore the package documentation. We are
committed to further developing BGGM. In the next version, all of
the methods will accommodate ordinal, binary, and mixed data. Currently,
there is one shiny application freely available online that
implements the confirm()
method
(link) (Chang et al.
2019). In the near future, all of the methods in BGGM will be
implemented in a shinydashboard (Chang and Borges Ribeiro 2018).
DRW was supported by a National Science Foundation Graduate Research Fellowship under Grant No. 1650042 and JM was supported by a ERC Starting Grant (758791).
Briganti, G, and P Linkowski. 2019. “Item and domain network structures of the Resilience Scale for Adults in 675 university students.” Epidemiology and Psychiatric Sciences, 1–9. https://doi.org/10.1017/S2045796019000222.
Bürkner, Paul-Christian. 2018. “Advanced Bayesian Multilevel Modeling with the R Package brms.” The R Journal 10 (1): 395–411. https://doi.org/10.32614/RJ-2018-017.
Chang, Winston, and Barbara Borges Ribeiro. 2018. Shinydashboard: Create Dashboards with ’Shiny’. https://CRAN.R-project.org/package=shinydashboard.
Chang, Winston, Joe Cheng, JJ Allaire, Yihui Xie, and Jonathan McPherson. 2019. Shiny: Web Application Framework for R. https://CRAN.R-project.org/package=shiny.
Gabry, Jonah, Daniel Simpson, Aki Vehtari, Michael Betancourt, and Andrew Gelman. 2019. “Visualization in Bayesian Workflow.” J. R. Stat. Soc. A 182 (2): 389–402. https://doi.org/10.1111/rssa.12378.
Gelman, Andrew, Ben Goodrich, Jonah Gabry, and Aki Vehtari. 2019. “R-squared for Bayesian Regression Models” 73 (3): 307–9. https://doi.org/10.1080/00031305.2018.1549100.
Haslbeck, Jonas M B, and Lourens J. Waldorp. 2018. “How well do network models predict observations? On the importance of predictability in network models.” Behavior Research Methods 50 (2): 853–61. https://doi.org/10.3758/s13428-017-0910-x.
Kuismin, Markku, and Mikko Sillanpää. 2017. “Estimation of covariance and precision matrix, network structure, and a view toward systems biology.” Wiley Interdisciplinary Reviews: Computational Statistics 9 (6): 1–13. https://doi.org/10.1002/wics.1415.
Kwan, C. C. Y. 2014. “A Regression-Based Interpretation of the Inverse of the Sample Covariance Matrix.” Spreadsheets in Education 7 (1).
Leday, Gwenaël G. R., and Sylvia Richardson. 2018. “Fast Bayesian inference in large Gaussian graphical models,” 1–26. http://arxiv.org/abs/1803.08155.
Mohammadi, A., and E. C. Wit. 2015. “Bayesian structure learning in sparse Gaussian graphical models.” Bayesian Analysis 10 (1): 109–38. https://doi.org/10.1214/14-BA889.
Mulder, Joris, and Luis Pericchi. 2018. “The Matrix-F Prior for Estimating and Testing Covariance Matrices.” Bayesian Analysis, no. 4: 1–22. https://doi.org/10.1214/17-BA1092.
Rouder, Jeffrey N., Julia M. Haaf, and Joachim Vandekerckhove. 2018. “Bayesian inference for psychology, part IV: parameter estimation and Bayes factors.” Psychonomic Bulletin and Review 25 (1): 102–13. https://doi.org/10.3758/s13423-017-1420-7.
Stephens, Guy. 1998. “On the Inverse of the Covariance Matrix in Portfolio Analysis.” The Journal of Finance 53 (5): 1821–7.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
Wilke, Claus O. 2018. Ggridges: Ridgeline Plots in ’Ggplot2’. https://CRAN.R-project.org/package=ggridges.
Williams, Donald R. 2018. “Bayesian Inference for Gaussian Graphical Models: Structure Learning, Explanation, and Prediction.” https://doi.org/10.31234/OSF.IO/X8DPR.
Williams, Donald R, and Joris Mulder. 2019. “Bayesian Hypothesis Testing for Gaussian Graphical Models: Conditional Independence and Order Constraints.” PsyArXiv, January. https://doi.org/10.31234/osf.io/ypxd8.
Williams, Donald R, Philippe Rast, Luis R Pericchi, and Joris Mulder. 2019. “Comparing Gaussian Graphical Models with the Posterior Predictive Distribution and Bayesian Model Selection.” https://doi.org/https://doi.org/10.31234/osf.io/yt386.
install.packages('BGGM')