KF: Kalman Filter for State Space Models

Description

These functions run the iterative equations of the Kalman filter for a state space model.

Usage

KF(y, ss, convergence = c(0.001, length(y)), t0 = 1)
KF.C(y, ss, convergence = c(0.001, length(y)), t0 = 1)
KF.deriv(y, ss, xreg = NULL, convergence = c(0.001, length(y)), t0 = 1)
KF.deriv.C(y, ss, xreg = NULL, convergence = c(0.001, length(y)), 
  t0 = 1, return.all = FALSE)

Arguments

a numeric time series or vector.

a list containing the matrices of the state space model.

xreg

optional matrix or list of external regressors. See details.

convergence

a numeric vector of length two to control and determine the convergence of the filter. See details below.

a numeric indicating the index of the first observation at which the contributions to the likelihood are addep up.

return.all

logical. If TRUE, extended output containing elements to be used by KS.deriv is returned.

Value

A list containing the following elements:
vprediction error.
fvariance of the prediction error.
KKalman gain.
Lauxiliar matrices to be used by the smoother.
a.predone step ahead prediction of the state vector.
P.predcovariance matrix of a.pred.
a.updupdate of a.pred after the observation at time $t$ that is predicted in a.pred enters in the recursive filter.
P.updupdate of P.pred.
convitthe iteration at which the filter converged. If convergence was not observed it is set to NULL.
mllvalue of the negative of the log-likelihood function.
The function KF.C is a simplified and faster version that records and returns only the value of negative of the log-likelihood function. It is suited to be passed as argument to optim in the stats package.
The functions that evaluate the derivatives include in the output the derivatem terms: da.pred, dP.pred, da.upd, dP.upd, dv, df, dvof (derivative of quotient v over f), dK and dL.
KF.deriv.C does not return a.upd and P.upd and their derivative terms da.upd and dP.upd. If return.all = TRUE, this function returns: dvof, dL, da.pred, dP.pred, which are the derivative terms necessary to evaluate the gradient of the log-likelihood function. By default they are not returned, return.all = FALSE. They are in any case computed, the operations that are omitted in the latter case is the arrangement of the output from the call to compiled C code into matrices of the proper dimension containing the data in the right order.
Derivatives of the likelihood function are implemented in package stsm. Although the Kalman filter provides information to evaluate the likelihood function, it is not its primary objective. That's why the derivatives of the likelihood are currently part of the package stsm, which is specific to likelihood methods in structural time series models.

State space representation

The general univariate linear Gaussian state space model is defined as follows: $$y[t] = Za[t] + e[t], e[t] \sim N(0, H)$$ $$a[t+1] = Ta[t] + Rw[t], w[t] \sim N(0, V)$$

for $t=1,\dots,n$ and $a[1] \sim N(a0, P0)$. $Z$ is a matrix of dimension $1\times m$; $H$ is $1\times 1$; $T$ is $m\times m$; $R$ is $m\times r$; $V$ is $r\times r$; $a0$ is $m\times 1$ and $P0$ is $m\times m$, where $m$ is the dimension of the state vector $a$ and $r$ is the number of variance parameters in the state vector.

The Kalman filtering recursions for the model above are:

Prediction $$a[t] = T a[t-1]$$ $$P[t] = T P[t-1] T' + R V R'$$ $$v[t] = y[t] - Z a[t]$$ $$F[t] = Z P[t] Z' + H$$

Updating $$K[t] = P[t] Z' F[t]^{-1}$$ $$a[t] = a[t] + K[t] v[t]$$ $$P[t] = P[t] - K[t] Z P[t]'$$

for $t=2,\dots,n$, starting with $a[1]$ and $P[1]$ equal to a0 and P0. $v[t]$ is the prediction error at observation in time $t$ and $F[t]$ is the variance of $v[t]$.

Details

The implementation is a direct transcription of the iterative equations of the filter that are summarized below. Details can be found in the references given below and in many other textbooks. The source code follows the notation used in Durbin and Koopman (2001).

The elements in the argument ss must be named in accordance with the notation given below for the state space representation. For those models defined in the package stsm, a convenient way to create the argument ss is by means of the function char2numeric.

The contributions to the likelihood function of the first observations may be omitted by choosing a value of t0 greater than one.

The functions with .deriv in the name compute the derivatives of some of the elements involved in the filter with respect to the parameters of the model.

The functions KF and KF.deriv are fully implemented in Rwhile KF.deriv.C calls to compiled C code.

Using KF.deriv.C with return.all = FALSE returns minimal output with the elements necessary to compute the derivative of the log-likelihood function. Using return.all = TRUE further elements to be used in KS.deriv are returned.

Missing observations are allowed. If a missing value is observed after the filter has converged then all operations of the filter are run instead of using steady state values until convergence is detected again. Parameters to control the convergence of the filter. In some models, the Kalman filter may converge to a steady state. Finding the explicit expression of the steady state values can be cumbersome in some models. Alternatively, at each iteration of the filter it can be checked whether a steady state has been reached. For it, some control parameters can be defined in the argument convergence. It is considered that convergence was reached when the following is observed: the change in the variance of the prediction error over the last convergence[2] consecutive iterations of the filter is below the tolerance value convergence[1]. When the steady state is reached, the values from the last iteration are used in the remaining iteration of the filter.Thus, the number of matrix operations can be reduced substantially as pointed in Harvey (1989) Sec. 3.3.4. External regressors. A matrix of external regressors can be passed in argument xreg. If xreg is a matrix then it is assumed that the time series passed in argument y has been already adjusted for the effect of these regressors, that is, $y_t^{adj} = y_t - X \gamma$. If y is the observed series, then xreg should be a list containing the following elements: xreg, the matrix of regressors; and coefs, the vector of coefficients, $\gamma$, related to the regressors. The coefficients must be placed in coefs in the same order as the corresponding vectors are arranged by columns in xreg.

The number of rows of the matrix of regressors must be equal to the length of the series y.

Column names are necessary for KF.deriv and are optional for KF.deriv.C.

References

Durbin, J. and Koopman, S. J. (2001). Time Series Analysis by State Space Methods. Oxford University Press.

Harvey, A. C. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press.

Examples

Run this code

# local level plus seasonal model with arbitrary parameter values
# for the 'JohnsonJohnson' time series
m <- stsm::stsm.model(model = "llm+seas", y = JohnsonJohnson, 
  pars = c("var1" = 2, "var2" = 15, "var3" = 30))
ss <- stsm::char2numeric(m)

# run the Kalman filter
kf <- KF(m@y, ss)
plot(kf$a.upd[,1:2], main = "filtered state vector")

# 'KF.C' is a faster version that returns only the 
# value of the negative of the likelihood function
kfc <- KF.C(m@y, ss)
all.equal(kf$mll, kfc)

# compute also derivative terms used below
kfd <- KF.deriv(m@y, ss)
all.equal(kfc, kfd$mll)
kfdc <- KF.deriv.C(m@y, ss, return.all = TRUE)
all.equal(kf$mll, kfdc$mll)

# as expected the versions that use compiled C code
# are faster that the versions written fully in R
# e.g. not including derivatives
system.time(for(i in seq_len(10)) kf <- KF(m@y, ss))
system.time(for(i in seq_len(10)) kfc <- KF.C(m@y, ss))
# e.g. including derivatives
system.time(for(i in seq_len(10)) kfd <- KF.deriv(m@y, ss))
system.time(for(i in seq_len(10)) kfdc <- KF.deriv.C(m@y, ss, return.all = TRUE))

# compare analytical and numerical derivatives
# they give same results up to a tolerance error
fcn <- function(x, model, type = c("v", "f"))
{
  m <- stsm::set.pars(model, x)
  ss <- stsm::char2numeric(m)
  kf <- KF(m@y, ss)
  switch(type, "v" = sum(kf$v), "f" = sum(kf$f))
}

dv <- numDeriv::grad(func = fcn, x = m@pars, model = m, type = "v")
all.equal(dv, colSums(kfd$dv), check.attributes = FALSE)
all.equal(dv, colSums(kfdc$dv), check.attributes = FALSE)
df <- numDeriv::grad(func = fcn, x = m@pars, model = m, type = "f")
all.equal(df, colSums(kfd$df), check.attributes = FALSE)
all.equal(df, colSums(kfdc$df), check.attributes = FALSE)

# compare timings in version written in R with numDeriv::grad
# no calls to compiled C code in either case
# looking at these timings, using analytical derivatives is
# expected to be useful in optimization algorithms
system.time(for (i in seq_len(10)) 
  numdv <- numDeriv::grad(func = fcn, x = m@pars, model = m, type = "v"))
system.time(for(i in seq_len(10)) kfdv <- colSums(KF.deriv(m@y, ss)$dv))

# compare timings when convergence is not checked with the case 
# when steady state values are used after convergence is observed
# computation time is reduced substantially
n <- length(m@y)
system.time(for(i in seq_len(20)) a <- KF.deriv(m@y, ss, convergence = c(0.001, n)))
system.time(for(i in seq_len(20)) b <- KF.deriv(m@y, ss, convergence = c(0.001, 10)))
# the results are the same up to a tolerance error
all.equal(colSums(a$dv), colSums(b$dv))

Run the code above in your browser using DataLab