filtering: Perform filtering inference in a Gaussian mixture dynamic Bayesian network

Description

This function performs filtering inference in a Gaussian mixture dynamic Bayesian network. For a sequence of \(T\) time slices, this task consists in estimating the state of the system at each time slice \(t\) (for \(1 \le t \le T\)) given all the data (the evidence) collected up to \(t\). This function is also designed to perform fixed-lag smoothing inference, which consists in defining a time lag \(l\) such that at each time slice \(t\) (for \(l + 1 \le t \le T\)), the state at \(t - l\) is estimated given the evidence collected up to \(t\) (Murphy, 2002). Filtering and fixed-lag smoothing inference are performed by sequential importance resampling, which is a particle-based approximate method (Koller and Friedman, 2009).

Usage

filtering(
  gmdbn,
  evid,
  nodes = names(gmdbn$b_1),
  col_seq = NULL,
  lag = 0,
  n_part = 1000,
  max_part_sim = 1e+06,
  min_ess = 1,
  verbose = FALSE
)

Value

If lag has one element, a data frame (tibble) with a structure similar to evid containing the estimated values of the inferred nodes and their observation sequences (if col_seq is not NULL). If lag has two or more elements, a list of data frames (tibbles) containing these values for each time lag.

Arguments

gmdbn: An object of class gmdbn.
evid: A data frame containing the evidence. Its columns must explicitly be named after nodes of gmdbn and can contain missing values (columns with no value can be removed).
nodes: A character vector containing the inferred nodes (by default all the nodes of gmdbn).
col_seq: A character vector containing the column names of evid that describe the observation sequence. If NULL (the default), all the observations belong to a single sequence. The observations of a same sequence must be ordered such that the \(t\)th one is related to time slice \(t\) (note that the sequences can have different lengths).
lag: A non-negative integer vector containing the time lags for which fixed-lag smoothing inference is performed. If 0 (the default), filtering inference is performed.
n_part: A positive integer corresponding to the number of particles generated for each observation sequence.
max_part_sim: An integer greater than or equal to n_part corresponding to the maximum number of particles that can be processed simultaneously. This argument is used to prevent memory overflow, dividing evid into smaller subsets that are handled sequentially.
min_ess: A numeric value in [0, 1] corresponding to the minimum ESS (expressed as a proportion of n_part) under which the renewal step of sequential importance resampling is performed. If 1 (the default), this step is performed at each time slice.
verbose: A logical value indicating whether subsets of evid and time slices in progress are displayed.

References

Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. The MIT Press.

Murphy, K. (2002). Dynamic Bayesian Networks: Representation, Inference and Learning. PhD thesis, University of California.

Examples

Run this code

# \donttest{
set.seed(0)
data(gmdbn_air, data_air)
evid <- data_air
evid$NO2[sample.int(7680, 1536)] <- NA
evid$O3[sample.int(7680, 1536)] <- NA
evid$TEMP[sample.int(7680, 1536)] <- NA
evid$WIND[sample.int(7680, 1536)] <- NA
filt <- filtering(gmdbn_air, evid, col_seq = "DATE", lag = c(0, 1),
                  verbose = TRUE)# }