For details and additional references please
consult Goerg and Shalizi (2012, 2013).
Let \(\mathcal{D} = \lbrace X(\mathbf{r}, t) \mid
\mathbf{r} \in \mathbf{S}, t = 1, \ldots, T \rbrace =
(X_1, \ldots, X_{\tilde{N}})\) be a sample from a
spatio-temporal process, observed over an
\(N\)-dimensional spatial grid \(\mathbf{S}\) and for
\(T\) time steps. We want to find a model that is
optimal for forecasting a new \(X(\mathbf{s}, u)\)
given the data \(\mathcal{D}\). To do this we need to
know $$ P(X(\mathbf{s}, u) \mid \mathcal{D}) $$
In general this is too complicated/time-intensive since
\(\mathcal{D}\) is very high-dimensional. But we know
that in any physical system, information can only
propagate at a finite speed, and thus we can restrict the
search for optimal predictors to a subset
\(\ell^{-}(\mathbf{r}, t) \subset \mathcal{D}\); this
is the past light cone (PLC) at
\((\mathbf{r}, t)\).
There exists a mapping \(\epsilon: \ell^{-}
\rightarrow \mathcal{S}\), where \(\mathcal{S} =
\lbrace s_1, \ldots, s_K \rbrace\) is the predictive state
space. This mapping is such that $$ P(X_i \mid
\ell^{-}_i) = P(X_i \mid s_j), $$ where \(s_j =
\epsilon(\ell^{-}_i)\) is the predictive state of PLC
\(i\). Furthermore, the future is independent of the
past given the predictive state: $$ P(X_i \mid
\ell^{-}_i, s_j) = P(X_i \mid s_j) . $$
The likelihood of the joint process factorizes as a
product of predictive conditional distributions $$
P(X_1, \ldots, X_N ) \propto \prod_{i=1}^{N} P(X_i \mid
\ell^{-}_i) = \prod_{i=1}^{N} P(X_i \mid
\epsilon(\ell^{-}_i)). $$
Since \(s_j\) is unknown this can be seen as the
complete data likelihood of a nonparametric finite
mixture model over predictive states: $$ P(X_1,
\ldots, X_N ) \propto \prod_{i=1}^{N} \sum_{j=1}^{K}
\mathbf{1}(\epsilon(\ell^{-}_i) = s_j) \times P(X_i \mid
s_j). $$
This predictive state model is a provably optimal finite
mixture model, where the ``parameter'' \(\epsilon\) is
chosen to provide optimal forecasts.
The LICORS R package implements methods to estimate
this optimal mapping \(\epsilon\).