score: Score of the Bayesian network

Description

Compute the score of the Bayesian network.

Usage

# S4 method for bn
score(x, data, type = NULL, ..., by.node = FALSE, debug = FALSE)
# S4 method for bn.naive
score(x, data, type = NULL, ..., by.node = FALSE, debug = FALSE)
# S4 method for bn.tan
score(x, data, type = NULL, ..., by.node = FALSE, debug = FALSE)
# S3 method for bn
logLik(object, data, ...)
# S3 method for bn
AIC(object, data, ..., k = 1)
# S3 method for bn
BIC(object, data, ...)

Value

For score() with by.node = TRUE, a vector of numeric values, the individual node contributions to the score of the Bayesian network. Otherwise, a single numeric value, the score of the Bayesian network.

Arguments

x, object: an object of class bn.
data: a data frame containing the data the Bayesian network that will be used to compute the score.
type: a character string, the label of a network score. If none is specified, the default score is the Bayesian Information Criterion for both discrete and continuous data sets. See network scores for details.
by.node: a boolean value. If TRUE and the score is decomposable, the function returns the score terms corresponding to each node; otherwise it returns their sum (the overall score of x).
debug: a boolean value. If TRUE a lot of debugging output is printed; otherwise the function is completely silent.
...: extra arguments from the generic method (for the AIC and logLik functions, currently ignored) or additional tuning parameters (for the score function).
k: a numeric value, the penalty coefficient to be used; the default k = 1 gives the expression used to compute the AIC in the context of scoring Bayesian networks.

Author

Marco Scutari

Details

Additional arguments of the score() function:

iss: the imaginary sample size used by the Bayesian Dirichlet scores (bde, mbde, bds, bdj). It is also known as “equivalent sample size”. The default value is equal to 1.
iss.mu: the imaginary sample size for the normal component of the normal-Wishart prior in the Bayesian Gaussian score (bge). The default value is 1.
iss.w: the imaginary sample size for the Wishart component of the normal-Wishart prior in the Bayesian Gaussian score (bge). The default value is ncol(data) + 2.
nu: the mean vector of the normal component of the normal-Wishart prior in the Bayesian Gaussian score (bge). The default value is equal to colMeans(data).
l: the number of scores to average in the locally averaged Bayesian Dirichlet score (bdla). The default value is 5.
exp: a list of indexes of experimental observations (those that have been artificially manipulated). Each element of the list must be named after one of the nodes, and must contain a numeric vector with indexes of the observations whose value has been manipulated for that node.
k: the penalty coefficient to be used by the AIC, BIC and penalized node-average log-likelihood scores. The default value is 1 for AIC, log(nrow(data)) / 2 for BIC and 1 / nnnodes(x) * nrow(data) ^ -0.25 for the node-average log-likelihood scores.
gamma: the additional penalty in the extended BIC scores. The default value is 0.5.
prior: the prior distribution to be used with the various Bayesian Dirichlet scores (bde, mbde, bds, bdj, bdla) and the Bayesian Gaussian score (bge). Possible values are:
- uniform (the default).
- vsp: the Bayesian variable selection prior, which puts a probability of inclusion on parents.
- marginal: an independent marginal uniform for each arc.
- cs: the Castelo & Siebes prior, which puts an independent prior probability on each arc and direction).
beta: the parameter associated with prior.
- If prior is uniform, beta is ignored.
- If prior is vsp, beta is the probability of inclusion of an additional parent. The default is 1/ncol(data).
- If prior is marginal, beta is the probability of inclusion of an arc. Each direction has a probability of inclusion of beta / 2 and the probability that the arc is not included is therefore 1 - beta. The default value is 0.5, so that arc inclusion and arc exclusion have the same probability.
- If prior is cs, beta is a data frame with columns from, to and prob specifying the prior probability for a set of arcs. A uniform probability distribution is assumed for the remaining arcs.
newdata: the test set whose predictive likelihood will be computed by pred-loglik, pred-loglik-g or pred-loglik-cg. It should be a data frame with the same variables as data.
fun: the function that computes the score component for a single node in the custom score. fun must have arguments node, parents, data and args, in this order; in other words, it must have signature function(node, parents, data, args). node will contain the label of the node to be scored (a character string); parents will contain the labels of its parents (a character vector); data will contain the complete data sets, with all the variables (a data frame); and args will be a list containing any additional arguments to the score.
args: a list containing the optional arguments to fun, for tuning custom score functions.

Examples

Run this code

data(learning.test)
dag = hc(learning.test)
score(dag, learning.test, type = "bde")

## let's see score equivalence in action!
dag2 = set.arc(dag, "B", "A")
score(dag2, learning.test, type = "bde")

## K2 score on the other hand is not score equivalent.
score(dag, learning.test, type = "k2")
score(dag2, learning.test, type = "k2")

## BDe with a prior.
beta = data.frame(from = c("A", "D"), to = c("B", "F"),
         prob = c(0.2, 0.5), stringsAsFactors = FALSE)
score(dag, learning.test, type = "bde", prior = "cs", beta = beta)

## equivalent to logLik(dag, learning.test)
score(dag, learning.test, type = "loglik")

## equivalent to AIC(dag, learning.test)
score(dag, learning.test, type = "aic")

## custom score, computing BIC manually.
my.bic = function(node, parents, data, args) {

  n = nrow(data)

  if (length(parents) == 0) {

    counts = table(data[, node])
    nparams = dim(counts) - 1
    sum(counts * log(counts / n)) - nparams * log(n) / 2

  }#THEN
  else {

    counts = table(data[, node], configs(data[, parents, drop = FALSE]))
    nparams = ncol(counts) * (nrow(counts) - 1)
    sum(counts * log(prop.table(counts, 2))) - nparams * log(n) / 2

  }#ELSE

}#MY.BIC
score(dag, learning.test, type = "custom", fun = my.bic, by.node = TRUE)
score(dag, learning.test, type = "bic", by.node = TRUE)

Run the code above in your browser using DataLab

Get 50% off unlimited learning