This function performs predictive inference in a Gaussian mixture dynamic Bayesian network. For a sequence of \(T\) time slices, this task consists in defining a time horizon \(h\) such that at each time slice \(t\) (for \(0 \le t \le T - h\)), the state of the system at \(t + h\) is estimated given all the data (the evidence) collected up to \(t\). Although the states at \(t + 1, \dots , t + h\) are observed in the future, some information about them can be known a priori (such as contextual information or features controlled by the user). This "predicted" evidence can be taken into account when propagating the particles from \(t\) to \(t + h\) in order to improve the predictions. Predictive inference is performed by sequential importance resampling, which is a particle-based approximate method (Koller and Friedman, 2009).
prediction(
gmdbn,
evid,
evid_pred = NULL,
nodes = names(gmdbn$b_1),
col_seq = NULL,
horizon = 1,
n_part = 1000,
max_part_sim = 1e+06,
min_ess = 1,
verbose = FALSE
)
If horizon
has one element, a data frame with a structure
similar to evid
containing the predicted values of the inferred
nodes and their observation sequences (if col_seq
is not NULL
).
If horizon
has two or more elements, a list of data frames (tibbles)
containing these values for each time horizon.
An object of class gmdbn
.
A data frame containing the evidence. Its columns must explicitly
be named after nodes of gmdbn
and can contain missing values (columns
with no value can be removed).
A data frame containing the "predicted" evidence. Its
columns must explicitly be named after nodes of gmdbn
and can contain
missing values (columns with no value can be removed).
A character vector containing the inferred nodes (by default all
the nodes of gmdbn
).
A character vector containing the column names of evid
and evid_pred
that describe the observation sequence. If NULL
(the default), all the observations belong to a single sequence. The
observations of a same sequence must be ordered such that the \(t\)th one
is related to time slice \(t\) (note that the sequences can have different
lengths).
A positive integer vector containing the time horizons for which predictive inference is performed.
A positive integer corresponding to the number of particles generated for each observation sequence.
An integer greater than or equal to n_part
corresponding to the maximum number of particles that can be processed
simultaneously. This argument is used to prevent memory overflow, dividing
evid
into smaller subsets that are handled sequentially.
A numeric value in [0, 1] corresponding to the minimum ESS
(expressed as a proportion of n_part
) under which the renewal step of
sequential importance resampling is performed. If 1
(the default),
this step is performed at each time slice.
A logical value indicating whether subsets of evid
and
time slices in progress are displayed.
Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. The MIT Press.
filtering
, inference
,
smoothing
# \donttest{
set.seed(0)
data(gmdbn_air, data_air)
evid <- data_air
evid$NO2[sample.int(7680, 1536)] <- NA
evid$O3[sample.int(7680, 1536)] <- NA
pred <- prediction(gmdbn_air, evid, evid[, c("DATE", "TEMP", "WIND")],
nodes = c("NO2", "O3"), col_seq = "DATE",
horizon = c(1, 2), verbose = TRUE)# }
Run the code above in your browser using DataLab