This function performs filtering inference in a Gaussian mixture dynamic Bayesian network. For a sequence of \(T\) time slices, this task consists in estimating the state of the system at each time slice \(t\) (for \(1 \le t \le T\)) given all the data (the evidence) collected up to \(t\). This function is also designed to perform fixed-lag smoothing inference, which consists in defining a time lag \(l\) such that at each time slice \(t\) (for \(l + 1 \le t \le T\)), the state at \(t - l\) is estimated given the evidence collected up to \(t\) (Murphy, 2002). Filtering and fixed-lag smoothing inference are performed by sequential importance resampling, which is a particle-based approximate method (Koller and Friedman, 2009).
filtering(
gmdbn,
evid,
nodes = names(gmdbn$b_1),
col_seq = NULL,
lag = 0,
n_part = 1000,
max_part_sim = 1e+06,
min_ess = 1,
verbose = FALSE
)
If lag
has one element, a data frame (tibble) with a structure
similar to evid
containing the estimated values of the inferred
nodes and their observation sequences (if col_seq
is not NULL
).
If lag
has two or more elements, a list of data frames (tibbles)
containing these values for each time lag.
An object of class gmdbn
.
A data frame containing the evidence. Its columns must explicitly
be named after nodes of gmdbn
and can contain missing values (columns
with no value can be removed).
A character vector containing the inferred nodes (by default all
the nodes of gmdbn
).
A character vector containing the column names of evid
that describe the observation sequence. If NULL
(the default), all the
observations belong to a single sequence. The observations of a same sequence
must be ordered such that the \(t\)th one is related to time slice \(t\)
(note that the sequences can have different lengths).
A non-negative integer vector containing the time lags for which
fixed-lag smoothing inference is performed. If 0
(the default),
filtering inference is performed.
A positive integer corresponding to the number of particles generated for each observation sequence.
An integer greater than or equal to n_part
corresponding to the maximum number of particles that can be processed
simultaneously. This argument is used to prevent memory overflow, dividing
evid
into smaller subsets that are handled sequentially.
A numeric value in [0, 1] corresponding to the minimum ESS
(expressed as a proportion of n_part
) under which the renewal step of
sequential importance resampling is performed. If 1
(the default),
this step is performed at each time slice.
A logical value indicating whether subsets of evid
and
time slices in progress are displayed.
Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. The MIT Press.
Murphy, K. (2002). Dynamic Bayesian Networks: Representation, Inference and Learning. PhD thesis, University of California.
inference
, prediction
,
smoothing
# \donttest{
set.seed(0)
data(gmdbn_air, data_air)
evid <- data_air
evid$NO2[sample.int(7680, 1536)] <- NA
evid$O3[sample.int(7680, 1536)] <- NA
evid$TEMP[sample.int(7680, 1536)] <- NA
evid$WIND[sample.int(7680, 1536)] <- NA
filt <- filtering(gmdbn_air, evid, col_seq = "DATE", lag = c(0, 1),
verbose = TRUE)# }
Run the code above in your browser using DataLab