bnlearn-package: Bayesian network structure learning.

Description

Bayesian network learning via constraint-based and score-based algorithms.

Arguments

Available constraint-based learning algorithms

Grow-Shrink(gs): based on theGrow-Shrink Markov Blanket, the first (and simplest) Markov blanket detection algorithm (Margaritis, 2003) used in a structure learning algorithm.
Incremental Association(iamb): based on the Markov blanket detection algorithm of the same name (Tsamardinos et al., 2003), which is based on a two-phase selection scheme (a forward selection followed by an attempt to remove false positives).
Fast Incremental Association(fast.iamb): a variant of IAMB which uses speculative stepwise forward selection to reduce the number of conditional independence tests (Yaramakala and Margaritis, 2005).
Interleaved Incremental Association(inter.iamb): another variant of IAMB which uses forward stepwise selection (Tsamardinos et al., 2003) to avoid false positives in the Markov blanket detection phase.
Max-Min Parents and Children(mmpc): a forward selection technique for neighbourhood detection based on the maximization of the minimum association measure observed with any subset of the nodes selected in the previous iterations (Tsamardinos, Brown and Aliferis, 2006). It learns the underlying structure of the Bayesian network (all the arcs are undirected, no attempt is made to detect their orientation).

This package includes three implementations of each algorithm:

an optimized implementation (used when theoptimizedparameter is set toTRUE), which uses backtracking to roughly halve the number of independence tests.
an unoptimized implementation (used when theoptimizedparameter is set toFALSE) which is better at uncovering possible erratic behaviour of the statistical tests.
a cluster-aware implementation, which requires a running cluster set up with themakeClusterfunction from thesnowpackage. Seesnow integrationfor a sample usage.

The computational complexity of these algorithms is polynomial in the number of tests, usually $O(N^2)$ ($O(N^4)$ in the worst case scenario). Execution time scales linearly with the size of the data set.

Available score-based learning algorithms

Hill-Climbing(hc): ahill climbinggreedy search on the space of the directed graphs. The optimized implementation uses score caching, score decomposability and score equivalence to reduce the number of duplicated tests. Random restart with a configurable number of perturbing operations is implemented.

Available hybrid learning algorithms

Max-Min Hill-Climbing(mmhc): a hybrid algorithm which combines the Max-Min Parents and Children algorithm (to restrict the search space) and the Hill-Climbing algorithm (to find the optimal network structure in the restricted space).
Restricted Hill-Climbing(rshc): a more general implementation of the Max-Min Hill-Climbing, which can use any combination of constraint-based and score-based algorithms.

Available (conditional) independence tests

The conditional independence tests used in constraint-based algorithms in practice are statistical tests on the data set. Available tests (and the respective labels) are:

discrete case(multinomial distribution)
- mutual information: an information-theoretic distance measure. It's proportional to the log-likelihood ratio (they differ by a$2n$factor) and is related to the deviance of the tested models. Both the asymptotic$\chi^2$test (mi) and the Monte Carlo permutation test (mc-mi) are implemented.
- fast mutual information(fmi): a variant of the mutual information which is set to zero when there aren't at least five data per parameter.
- Pearson's $X^2$: the classical Pearson's$X^2$test for contingency tables. Both the asymptotic$\chi^2$test (x2) and the Monte Carlo permutation test (mc-x2) are implemented.
- Akaike Information Criterion(aict): an experimental AIC-based independence test, computed comparing the mutual information and the expected information gain.
continuous case(multivariate normal distribution)
- linear correlation: linear correlation. Both the asymptotic Student's t test (cor) and the Monte Carlo permutation test (mc-cor) are implemented.
- Fisher's Z: a transformation of the linear correlation with asymptotic normal distribution. Used by commercial software (such as TETRAD II) for the PC algorithm (an R implementation is present in thepcalgpackage on CRAN). Both the asymptotic normal (zf) and the Monte Carlo permutation test (mc-zf) are implemented.
- mutual information: an information-theoretic distance measure. Again it's proportional to the log-likelihood ratio (they differ by a$2n$factor). Both the asymptotic$\chi^2$test (mi-g) and the Monte Carlo permutation test (mc-mi-g) are implemented.

Available network scores

Available scores (and the respective labels) are:

discrete case(multinomial distribution)
- the multinomiallog-likelihood(loglik) score, which is equivalent to theentropy measureused in Weka.
- theAkaike Information Criterionscore (aic).
- theBayesian Information Criterionscore (bic), which is equivalent to theMinimum Description Length(MDL) and is also known asSchwarz Information Criterion.
- the logarithm of theBayesian Dirichlet equivalentscore (bde), a score equivalent Dirichlet posterior density.
- the logarithm of theK2score (k2), a Dirichlet posterior density (not score equivalent).
continuous case(multivariate normal distribution)
- the multivariate Gaussianlog-likelihood(loglik-g) score.
- the correspondingAkaike Information Criterionscore (aic-g).
- the correspondingBayesian Information Criterionscore (bic-g).
- a score equivalentGaussian posterior density(bge).

Whitelist and blacklist support

All learning algorithms support arc whitelisting and blacklisting:

blacklisted arcs are never present in the graph.
arcs whitelisted in one direction only (i.e.$A \rightarrow B$is whitelisted but$B \rightarrow A$is not) have the respective reverse arcs blacklisted, and are always present in the graph.
arcs whitelisted in both directions (i.e. both$A \rightarrow B$and$B \rightarrow A$are whitelisted) are present in the graph, but their direction is set by the learning algorithm.

Any arc whitelisted and blacklisted at the same time is assumed to be whitelisted, and is thus removed from the blacklist.

Error detection and correction: the strict mode

Optimized implementations of constraint-based algorithms rely heavily on backtracking to reduce the number of tests needed by the learning procedure. This approach may hide errors either in the Markov blanket or the neighbourhood detection phase in some particular cases, such as when hidden variables are present or there are external (logical) constraints on the interactions between the variables.

On the other hand in the unoptimized implementations the Markov blanket and neighbour detection of each node is completely independent from the rest of the learning process. Thus it may happen that the Markov blanket or the neighbourhoods are not symmetric (i.e. A is in the Markov blanket of B but not vice versa), or that some arc directions conflict with each other.

The strict parameter enables some measure of error correction, which may help to retrieve a good model even when the learning process would otherwise fail:

ifstrictis set toTRUE, every error stops the learning process and results in an error message.
ifstrictis set toFALSE:
1. v-structures are applied to the network structure in lowest-p.value order; if any arc is already oriented in the opposite direction, the v-structure is discarded.
2. nodes which cause asymmetries in any Markov blanket are removed from that Markov blanket; they are treated as false positives.
3. nodes which cause asymmetries in any neighbourhood are removed from that neighbourhood; again they are treated as false positives (see Tsamardinos, Brown and Aliferis, 2006).

Details

ll{

Package: bnlearn Type: Package Version: 1.6 Date: 2009-08-24 License: GPLv2 or later

}

This package implements some algorithms for learning the structure of Bayesian networks.

Constraint-based algorithms, also known as conditional independence learners, are all optimized derivatives of the Inductive Causation algorithm (Verma and Pearl, 1991). These algorithms use conditional independence tests to detect the Markov blankets of the variables, which in turn are used to compute the structure of the Bayesian network.

Score-based learning algorithms are general purpose heuristic optimization algorithms which rank network structures with respect to a goodness-of-fit score.

References

(a BibTeX file with all the references cited throughout this manual is present in the bibtex directory of this package)

Agresti A (2002). Categorical Data Analysis. Wiley Series in Probability and Statistics. Wiley-Interscience, 2nd edition.

Korb K, Nicholson AE (2003). Bayesian Artificial Intelligence. Chapman & Hall/CRC.

Margaritis D (2003). Learning Bayesian Network Model Structure from Data. Ph.D. thesis, School of Computer Science, Carnegie-Mellon University, Pittsburgh, PA. Available as Technical Report CMU-CS-03-153.

Pearl J (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann.

Tsamardinos I, Aliferis CF, Statnikov A (2003). "Algorithms for Large Scale Markov Blanket Discovery". In "Proceedings of the Sixteenth International Florida Artificial Intelligence Research Society Conference", pp. 376-381. AAAI Press.

Tsamardinos I, Brown LE, Aliferis CF (2006). "The Max-Min Hill-Climbing Bayesian Network Structure Learning Algorithm". Machine Learning, 65(1), 31-78.

Yaramakala S, Margaritis D (2005). "Speculative Markov Blanket Discovery for Optimal Feature Selection". In "ICDM '05: Proceedings of the Fifth IEEE International Conference on Data Mining", pp. 809-812. IEEE Computer Society.

Examples

Run this code

library(bnlearn)
data(learning.test)

## Simple learning
# first try the Grow-Shrink algorithm
res = gs(learning.test)
# plot the network structure.
plot(res)
# now try the Incremental Association algorithm.
res2 = iamb(learning.test)
# plot the new network structure.
plot(res2)
# the network structures seem to be identical, don't they?
compare(res, res2)
# [1] TRUE
# how many tests each of the two algorithms used?
res$learning$ntests
# [1] 41
res2$learning$ntests
# [1] 50
# and the unoptimized implementation of these algorithms?
gs(learning.test, optimized = FALSE)$learning$ntests
# [1] 90
iamb(learning.test, optimized = FALSE)$learning$ntests
# [1] 116

## Greedy search
res = hc(learning.test)
plot(res)

## Another simple example (Gaussian data)
data(gaussian.test)
# first try the Grow-Shrink algorithm
res = gs(gaussian.test)
plot(res)

## Blacklist and whitelist use
# the arc B - F should not be there?
blacklist = data.frame(from = c("B", "F"), to = c("F", "B"))
blacklist
#   from to
# 1    B  F
# 2    F  B
res3 = gs(learning.test, blacklist = blacklist)
plot(res3)
# force E - F direction (E -> F).
whitelist = data.frame(from = c("E"), to = c("F"))
whitelist
#   from to
# 1    E  F
res4 = gs(learning.test, whitelist = whitelist)
plot(res4)
# use both blacklist and whitelist.
res5 = gs(learning.test, whitelist = whitelist, blacklist = blacklist)
plot(res5)

## Debugging
# use the debugging mode to see the learning algorithms
# in action.
res = gs(learning.test, debug = TRUE)
res = hc(learning.test, debug = TRUE)
# log the learning process for future reference.
sink(file = "learning-log.txt")
res = gs(learning.test, debug = TRUE)
sink()
# if something seems wrong, try the unoptimized version
# in strict mode (inconsistencies trigger errors):
res = gs(learning.test, optimized = FALSE, strict = TRUE, debug = TRUE)
# or disable strict mode to let the algorithm fix errors on the fly:
res = gs(learning.test, optimized = FALSE, strict = FALSE, debug = TRUE)

Run the code above in your browser using DataLab