MARSSresiduals.tT: MARSS Smoothed Residuals

Description

Calculates the standardized (or auxiliary) smoothed residuals sensu Harvey, Koopman and Penzer (1998). The expected values and variance for missing (or left-out) data are also returned (Holmes 2014). Not exported. Access this function with residuals(object, conditioning="T").

Usage

MARSSresiduals.tT(object, Harvey=FALSE, normalize=FALSE, silent=FALSE)

Arguments

object

An object of class marssMLE.

Harvey

TRUE/FALSE. Use the Harvey et al. (1998) algorithm or use the Holmes (2014) algorithm. The values are the same except for missing values.

normalize

TRUE/FALSE

silent

If TRUE, don't print inversion warnings.

Value

A list with the following components

model.residuals

The the observed smoothed model residuals: data minus the model predictions conditioned on all observed data. This is different than the Kalman filter innovations which use on the data up to time $t-1$ for the predictions. See details.

state.residuals

The smoothed state residuals $\tilde{\mathbf{x}}_{t+1}^T - \mathbf{Z}\tilde{\mathbf{x}}_t^T - \mathbf{u}$.

residuals

The residuals conditioned on the observed data. Returned as a (n+m) x T matrix with model.residuals in rows 1 to n and state.residuals in rows n+1 to n+m. NAs will appear in rows 1 to n is the places where data are missing.

var.residuals

The joint variance of the model and state residuals conditioned on observed data. Returned as a (n+m) x (n+m) x T matrix. For Harvey=FALSE, this is Holmes (2014) equation 57. For Harvey=TRUE, this is the residual variance in eqn. 24, page 113, in Harvey et al. (1998). They are identical except for missing values, for those Harvey=TRUE returns 0s.

std.residuals

The Cholesky standardized residuals as a (n+m) x T matrix. This is residuals multiplied by the inverse of the Cholesky decomposition of var.residuals. The model standardized residuals associated with the missing data are replaced with NA. This for convenience for residuals diagnostics.

mar.residuals

The marginal standardized residuals as a (n+m) x T matrix. This is residuals multiplied by the inverse of the diagonal matrix formed by the square-root of the diagonal of var.residuals. The model marginal residuals associated with the missing data are replaced with NA. This for convenience for residuals diagnostics.

E.obs.residuals

The expected value of the model residuals conditioned on the observed data. Returned as a n x T matrix. For observed data, this will be the observed residuals. For unobserved data, this will be 0 if $\mathbf{R}$ is diagonal but non-zero if $\mathbf{R}$ is non-diagonal. See details.

var.obs.residuals

The variance value of the model residuals conditioned on the observed data. Returned as a n x n x T matrix. For observed data, this will be 0. See details.

msg

Any warning messages. This will be printed unless Object$control$trace = -1 (suppress all error messages).

Details

This function returns the raw, Cholesky standardized and marginal standardized smoothed model and state residuals. 'smoothed' means conditioned on all the observed data and a set of parameters. These are the residuals presented in Harvey, Koopman and Penzer (1998) pages 112-113, with the addition of the values for unobserved data (Holmes 2014). If Harvey=TRUE, the function uses the algorithm on page 112 of Harvey, Koopman and Penzer (1998) to compute the conditional residuals and variance of the residuals. If Harvey=FALSE, the function uses the equations in the technical report (Holmes 2014).

The residuals matrix has a value for each time step. The residuals in column $t$ rows 1 to n are the model residual associated with the data at time $t$. The residuals in rows n+1 to n+m are the state residuals associated with the transition from $\mathbf{x}_t$ to $\mathbf{x}_{t+1}$, not the transition from $\mathbf{x}_t$ to $\mathbf{x}_{t+1}$. Because $\mathbf{x}_{t+1}$ does not exist at time $T$, the state residuals and associated variances at time $T$ are NA.

Below the conditional residuals and their variance are discussed. The random variables are capitalized and the realizations from the random variables are lower case. The random variables are $\mathbf{X}$, $\mathbf{Y}$, $\mathbf{V}$ and $\mathbf{W}$. There are two types of $\mathbf{Y}$. The observed $\mathbf{Y}$ that are used to estimate the states $\mathbf{x}$. These are termed $\mathbf{Y}^{(1)}$. The unobserved $\mathbf{Y}$ are termed $\mathbf{Y}^{(2)}$. These are not used to estimate the states $\mathbf{x}$ and we may or may not know the values of $\mathbf{y}^{(2)}$. Typically we treat $\mathbf{y}^{(2)}$ as unknown but it may be known but we did not include it in our model fitting. Note that the model parameters $\Theta$ are treated as fixed or known. The 'fitting' does not involve estimating $\Theta$; it involves estimating $\mathbf{x}$. All MARSS parameters can be time varying but the $t$ subscripts are left off parameters to reduce clutter.

Model residuals

$\mathbf{v}_t$ is the difference between the data and the predicted data at time $t$ given $\mathbf{x}_t$: $$ \mathbf{v}_t = \mathbf{y}_t - \mathbf{Z} \mathbf{x}_t - \mathbf{a} $$ The observed model residuals $\hat{\mathbf{v}}_t$ are the difference between the observed data and the predicted data at time $t$ using the fitted model. MARSSresiduals.tT fits the model using all the data. So $$ \hat{\mathbf{v}}_t = \mathbf{y}_t - \mathbf{Z}\tilde{\mathbf{x}}_t^T - \mathbf{a} $$ where $\tilde{\mathbf{x}}_t^T$ is the expected value of $\mathbf{X}_t$ conditioned on the data from 1 to $T$ (all the data), i.e. the Kalman smoother estimate of the states at time $t$. $\mathbf{y}_t$ are your data and missing values will appear as NA in the observed model residuals. These are returned as model.residuals and rows 1 to $n$ of residuals.

res1 and res2 in the code below will be the same.

dat = t(harborSeal)[2:3,]
MLEobj = MARSS(dat)
Z = coef(MLEobj, type="matrix")$Z
A = coef(MLEobj, type="matrix")$A
res1 = dat - Z %*% MLEobj$states - A %*% matrix(1,1,ncol(dat))
res2 = residuals(MLEobj)$model.residuals

state.residuals

$\mathbf{w}_t$ are the difference between the state at time $t$ and the expected value of the state at time $t$ given the state at time $t-1$: $$ \mathbf{w}_t = \mathbf{x}_t - \mathbf{B} \mathbf{x}_{t-1} - \mathbf{u} $$ The estimated state residuals $\hat{\mathbf{w}}_t$ are the difference between estimate of $\mathbf{x}_t$ minus the estimate using $\mathbf{x}_{t-1}$. $$ \hat{\mathbf{w}}_t = \tilde{\mathbf{x}}_t^T - \mathbf{B}\tilde{\mathbf{x}}_{t-1}^T - \mathbf{u} $$ where $\tilde{\mathbf{x}}_t^T$ is the Kalman smoother estimate of the states at time $t$ and $\tilde{\mathbf{x}}_{t-1}^T$ is the Kalman smoother estimate of the states at time $t-1$. The estimated state residuals are returned in state.residuals and rows $n+1$ to $n+m$ of residuals. There are no NAs in the estimated state residuals as an estimate of the state exists whether or not there are associated data.

res1 and res2 in the code below will be the same.

dat = t(harborSeal)[2:3,]
TT = ncol(dat)
MLEobj = MARSS(dat)
B = coef(MLEobj, type="matrix")$B
U = coef(MLEobj, type="matrix")$U
statest = MLEobj$states[,2:TT]
statestm1 = MLEobj$states[,1:(TT-1)]
res1 = statest - B %*% statestm1 - U %*% matrix(1,1,TT-1)
res2 = residuals(MLEobj)$state.residuals

Note that the state residual at the last time step ($T$) will be NA because it is the residual associated with $\mathbf{x}_T$ to $\mathbf{x}_{T+1}$ and $T+1$ is beyond the data. Similarly, the variance matrix at the last time step will have NAs for the same reason.

Variance of the residuals

In a state-space model, $\mathbf{X}$ and $\mathbf{Y}$ are stochastic, and the model and state residuals are random variables $\hat{\mathbf{V}}_t$ and $\hat{\mathbf{W}}_{t+1}$. To evaluate the residuals we observed (with $\mathbf{y}^{(1)}$), we use the joint distribution of $\hat{\mathbf{V}}_t, \hat{\mathbf{W}}_{t+1}$ across all the different possible data sets that our MARSS equations with parameters $\Theta$ might generate. Denote the matrix of $\hat{\mathbf{V}}_t, \hat{\mathbf{W}}_{t+1}$, as as $\widehat{\mathcal{E}}_t$. That distribution has an expected value (mean) and variance: $$ \textrm{E}[\widehat{\mathcal{E}}_t] = 0; \textrm{var}[\widehat{\mathcal{E}}_t] = \hat{\Sigma}_t $$ Our observed residuals residuals are one sample from this distribution. To standardize the observed residuals, we will use $ \hat{\Sigma}_t $. $ \hat{\Sigma}_t $ is returned in var.residuals. Rows/columns 1 to $n$ are the conditional variances of the model residuals and rows/columns $n+1$ to $n+m$ are the conditional variances of the state residuals. The off-diagonal blocks are the covariances between the two types of residuals.

Standardized residuals

residuals.marssMLE will return the Cholesky standardized residuals sensu Harvey et al. (1998) in std.residuals for outlier and shock detection. These are the model and state residuals multiplied by the inverse of the Cholesky decomposition of var.residuals. The standardized model residuals are set to NA when there are missing data. The standardized state residuals however always exist since the expected value of the states exist without data. The calculation of the standardized residuals for both the observations and states requires the full residuals variance matrix. Since the state residuals variance is NA at the last time step, the standarized residual in the last time step will be all NA.

The interpretation of the Cholesky standardized residuals is not straight-forward when the $\mathbf{Q}$ and $\mathbf{R}$ variance-covariance matrices are non-diagonal. The residuals which were generated by a non-diagonal variance-covariance matrices are transformed into orthogonal residuals in MVN(0,I) space. For example, if v is 2x2 correlated errors with variance-covariance matrix R. The transformed residuals (from this function) for the i-th row of v is a combination of the row 1 effect and the row 1 effect plus the row 2 effect. So in this case, row 2 of the transformed residuals would not be regarded as solely the row 2 residual but rather how different row 2 is from row 1, relative to expected. If the errors are highly correlated, then the transformed residuals can look rather non-intuitive.

The marginal standardized residuals are returned in mar.residuals. These are the model and state residuals multiplied by the inverse of the diagonal matrix formed by the square root of the diagonal of var.residuals. These residuals will be correlated (across the residuals at time $t$) but are easier to interpret when $\mathbf{Q}$ and $\mathbf{R}$ are non-diagonal.

Normalized residuals

If normalize=FALSE, the unconditional variance of $W_t$ and $V_t$ are $\mathbf{Q}$ and $\mathbf{R}$ and the model is assumed to be written as $$ y_t = Z x_t + a + v_t$$ $$ x_t = B x_{t-1} + u + w_t$$ Harvey et al (1998) writes the model as $$ y_t = Z x_t + a + Hv_t$$ $$ x_t = B x_{t-1} + u + Gw_t$$ with the variance of $V_t$ and $W_t$ equal to I (identity).

MARSSresiduals.tT returns the residuals defined as in the first equations. To get the residuals defined as Harvey et al. (1998) define them (second equations), then use normalize=TRUE. In that case the unconditional variance of residuals will be I instead of $\mathbf{Q}$ and $\mathbf{R}$.

Missing or left-out data

$ \textrm{E}[\widehat{\mathcal{E}}_t] $ and $ \textrm{var}[\widehat{\mathcal{E}}_t] $ are for the distribution across all possible $\mathbf{X}$ and $\mathbf{Y}$. We can also compute the expected value and variance conditioned on a specific value of $\mathbf{Y}$, the one we obseved $\mathbf{y}^{(1)}$ (Holmes 2014). If there are no missing values, this is not very interesting as $\textrm{E}[\hat{\mathbf{V}}_t|\mathbf{y}^{(1)}]=\hat{\mathbf{v}}_t$ and $\textrm{var}[\hat{\mathbf{V}}_t|\mathbf{y}^{(1)}] = 0$. If we have data that are missing because we left them out, however, $\textrm{E}[\hat{\mathbf{V}}_t|\mathbf{y}^{(1)}]$ and $\textrm{var}[\hat{\mathbf{V}}_t|\mathbf{y}^{(1)}]$ are the values we need to evaluate whether the left-out data are unusual relative to what you expect given the data you did collect.

E.obs.residuals is the conditional expected value $\textrm{E}[\hat{\mathbf{V}}|\mathbf{y}^{(1)}]$ (notice small $\mathbf{y}$). It is $$\textrm{E}[\mathbf{Y}_t|\mathbf{y}^{(1)}] - \mathbf{Z}\tilde{\mathbf{x}}_t^T - \mathbf{a} $$ It is similar to $\hat{\mathbf{v}}_t$. The difference is the $\mathbf{y}$ term. $\textrm{E}[\mathbf{Y}^{(1)}_t|\mathbf{y}^{(1)}] $ is $\mathbf{y}^{(1)}_t$ for the non-missing values. For the missing values, the value depends on $\mathbf{R}$. If $\mathbf{R}$ is diagonal, $\textrm{E}[\mathbf{Y}^{(2)}_t|\mathbf{y}^{(1)}] $ is $\mathbf{Z}\tilde{\mathbf{x}}_t^T + \mathbf{a}$ and the expected residual value is 0. If $\mathbf{R}$ is non-diagonal however, it will be non-zero.

var.obs.residuals is the conditional variance $\textrm{var}[\hat{\mathbf{V}}|\mathbf{y}^{(1)}]$ (eqn 24 in Holmes (2014)). For the non-missing values, this variance is 0 since $\hat{\mathbf{V}}|\mathbf{y}^{(1)}$ is a fixed value. For the missing values, $\hat{\mathbf{V}}|\mathbf{y}^{(1)}$ is not fixed because $\mathbf{Y}^{(2)}$ is a random variable. For these values, the variance of $\hat{\mathbf{V}}|\mathbf{y}^{(1)}$ is determined by the variance of $\mathbf{Y}^{(2)}$ conditioned on $\mathbf{Y}^{(1)}=\mathbf{y}^{(1)}$. This variance matrix is returned in var.obs.residuals. The variance of $\hat{\mathbf{W}}|\mathbf{y}^{(1)}$ is 0 and thus is not included.

The variance $\textrm{var}[\hat{\mathbf{V}}_t|\mathbf{Y}^{(1)}] $ (uppercase $ \mathbf{Y} $) returned in the 1 to $n$ rows/columns of var.residuals may also be of interest depending on what you are investigating with regards to missing values. For example, it may be of interest in a simulation study or cases where you have multiple replicated $\mathbf{Y}$ data sets. var.residuals would allow you to determine if the left-out residuals are unusual with regards to what you would expect for left-out data in that location of the $\mathbf{Y}$ matrix but not specifically relative to the data you did collect. If $\mathbf{R}$ is non-diagonal and the $\mathbf{y}^{(1)}$ and $\mathbf{y}^{(2)}$ are highly correlated, the variance of $\textrm{var}[\hat{\mathbf{V}}_t|\mathbf{Y}^{(1)}] $ and variance of $\textrm{var}[\hat{\mathbf{V}}_t|\mathbf{y}^{(1)}] $ for the left-out data would be quite different. In the latter, the variance is low because $\mathbf{y}^{(1)} $ has strong information about $\mathbf{y}^{(2)} $. In the former, we integrate over $\mathbf{Y}^{(1)} $ and the variance could be high (depending on the parameters).

References

Harvey, A., S. J. Koopman, and J. Penzer. 1998. Messy time series: a unified approach. Advances in Econometrics 13: 103-144 (see page 112-113). Eqn 21 is the Kalman eqns. Eqn 23 and 24 is the backward recursion to compute the smoothations. This function uses the MARSSkf output for eqn 21 and then implements the backwards recursion in eqn 23 and eqn 24. Pages 120-134 discuss the use of standardized residuals for outlier and structural break detection.

de Jong, P. and J. Penzer. 1998. Diagnosing shocks in time series. Journal of the American Statistical Association 93: 796-806. This one shows the same equations; see eqn 6. This paper mentions the scaling based on the inverse of the sqrt (chol) of the variance-covariance matrix for the residuals (model and state together). This is in the right column, half-way down on page 800.

Koopman, S. J., N. Shephard, and J. A. Doornik. 1999. Statistical algorithms for models in state space using SsfPack 2.2. Econometrics Journal 2: 113-166. (see pages 147-148).

Harvey, A. and S. J. Koopman. 1992. Diagnostic checking of unobserved-components time series models. Journal of Business & Economic Statistics 4: 377-389.

Holmes, E. E. 2014. Computation of standardized residuals for (MARSS) models. Technical Report. arXiv:1411.0045.

Examples

Run this code

# NOT RUN {
  dat <- t(harborSeal)
  dat <- dat[c(2,11),]
  MLEobj <- MARSS(dat)
  
  #state residuals
  state.resids1 <- residuals(MLEobj, conditioning="T")$state.residuals
  #this is the same as
  states <- MLEobj$states
  Q <- coef(MLEobj,type="matrix")$Q
  state.resids2 <- states[,2:30]-states[,1:29]-matrix(coef(MLEobj,type="matrix")$U,2,29)
  #compare the two
  cbind(t(state.resids1[,-30]), t(state.resids2))

  #normalize to variance of 1
  state.resids1 <- residuals(MLEobj, normalize=TRUE, conditioning="T")$state.residuals
  state.resids2 <- (solve(t(chol(Q))) %*% state.resids2)
  cbind(t(state.resids1[,-30]), t(state.resids2))

  #Cholesky standardized (by joint variance) model & state residuals
  residuals(MLEobj)$std.residuals
# }

Run the code above in your browser using DataLab