Below, X and Y refers to the random variable and x and y refer to a specific realization from this random variable.
type="parameters"
If type="parameters"
, this returns a data.frame with the estimated parameters of a MARSS model with, optionally, standard errors and confidence intervals. This assembles information available via the print.marssMLE
and coef.marssMLE
functions into a data.frame that summarizes the estimates. If conf.int=TRUE, MARSSparamCIs
will be run to add confidence intervals to the model object if these are not already added. The default CIs are calculated using a analytically computed Hessian matrix. This can be changed by passing in optional arguments for MARSSparamCIs
.
type="xtT"
tidy.marssMLE
returns the confidence and prediction intervals of the state at time t conditioned on all the data and using the estimated model parameters as true values. The prediction intervals (and .sd.x) are the standard intervals that are shown for the estimated states in state-space models. For example see, Shumway and Stoffer (2000), edition 4, Figure 6.4. As such, this is probably what you are looking for if you want to put intervals on the estimated states (the x). However, these intervals do not include parameter uncertainty. If you want state residiuals (for residuals analysis), use residuals.marssMLE
or augment.marssMLE
.
Quantiles The state \(\mathbf{X}_t\) in a MARSS model has a conditional multivariate normal distribution, that can be computed from the model parameters and data. In Holmes (2012, Eqn. 11) notation, its expected value conditioned on all the observed data (1:T) and the model parameters \(\Theta\) is \(\tilde{\mathbf{x}}_t\). In MARSSkf
, this is xtT[,t]
. The variance of \(\mathbf{X}_t\) conditioned on the observed data and \(\Theta\) is \(\tilde{\mathbf{V}}_t\) (VtT[,,t]
). Note that VtT[,,t] != B VtT[,,t-1] t(B) + Q, which you might think by looking at the MARSS equation for x. That is because the variance of W(t) conditioned on the data (past, current and FUTURE) is != Q (Q is the unconditional variance).
\(\tilde{\mathbf{x}}_t\) (xtT) is an estimate of \(\mathbf{x}_t\) (the true value), and the standard error of that estimate is given by \(\tilde{\mathbf{V}}_t\) (VtT[,,t]
). Let se.xt
denote the sqrt of the diagonal of VtT
. The equation for the \(\alpha/2\) confidence interval is (qnorm(alpha/2)*se.xt + xtT
). \(\mathbf{x}_t\) is multivariate and this interval is for one of the \(x\)'s in isolation. You could compute the m-dimensional confidence region for the multivariate \(\mathbf{x}_t\), also, but tidy.marssMLE
returns the univariate confidence intervals.
The variance VtT
gives information on the uncertainty of the true location of \(\mathbf{x}_t\) conditioned on the observed data. As more data are collected (or added to the analysis), this variance will shrink since the data, especially data at time t, increases the information about the locations of \(\mathbf{x}_t\). This does not affect the estimation of the model parameters, those are fixed (we are assuming), but rather our information about the states at time t.
If you have a DFA model (form='dfa'), you can pass in rotate=TRUE
to return the rotated trends. If you want the rotated loadings, you will need to compute those yourself:
dfa <- MARSS(t(harborSealWA[,-1]), model=list(m=2), form="dfa")
Z.est <- coef(dfa, type="matrix")$Z
H.inv <- varimax(coef(dfa, type="matrix")$Z)$rotmat
Z.rot <- Z.est %*% H.inv
Intervals for the observation process
For observation process, the expected values and intervals are shown for either new data (type="fitted.ytT"
) or the observed data set (type="ytT"
). Details on these are below after this discussion of intervals for the observation process
The types of intervals you want for data (Y part of the MARSS equation) depends on what you are trying to do.
Get the model predictions of the expected value of new Y or some underlying mean YUse type="fitted.ytT"
. This returns the fitted values (model predictions = Z x(t)+A) for \(Y_t\) conditioned on all the data. Confidence intervals and prediction intervals are returned. The former is the interval for the mean of new data and the latter is the interval for new data (not the mean but data themselves).
Get the distribution of new data at time t that would be generated by the processSame as above.
Compare your data to model predictionsIn this case, you want the distribution of the model residuals for the data. Use augment.marssMLE
with type="observations"
. You want the standard errors for the observed data minus the fitted values which is what augment.marssMLE
gives.
Get estimates and variance of missing data in your data setUse type="ytT"
. The observed data will have an expected value equal to the observed data and variance of 0, while the missing data will have an expected value and variance conditioned on all the observed data. Note, if R is diagonal then the missing data values (and intervals) will be the same as for type="fitted.ytT"
but if R is non-diagonal and some y at time t are missing and some are not, then the expected values will be very different.
Do a leave-one-out cross-validationIn this case, you want the distribution of the model residuals for those left-out values. Use augment.marssMLE
with type="observations"
. You want the standard errors for the left-out data minus the fitted values which is what augment.marssMLE
gives.
One-step-ahead predictionsUse fitted.marssMLE
with type="ytt1"
or type="xtt1"
. Confidence (mean prediction) and prediction intervals (new data) are returned.
Y prediction conditioned on data up to t-1Same as one-step-ahead.
type="fitted.ytT"
For type="fitted.ytT"
, tidy.marssMLE
returns the analogous information for the Y part of the MARSS equation for an I.I.D. NEW DATA SET \(y'\). The expected value and variance of \(y'\) is conditioned on the data you did observe \(y\). It is important to note that \(y'\) is independent and identical (meaning i.i.d. in a statistical sense) to \(y\) except it has no missing values. Do not plot your observed data on these intervals. You need residuals intervals in that case. See augment.marssMLE
for those.
The expected value of a new data set \(\mathbf{Y'}_t\) conditioned on the observed data \(\mathbf{Y}=\mathbf{y(1:T)}\), is \(Enewy = Z_t \hat{x}_t + D_t d_t + a_t\), where \(\hat{x}_t\) is the expected value of \(\mathbf{X}_t\) conditioned on the data up to T. The variance of a new data set \(\mathbf{Y'}_t\) conditioned on the observed data 1:T, is \(var.newy = R_t + Z_t \hat{V}_t Z_t^\top\), where \(\hat{V}_t\) is the variance of \(\mathbf{X}_t\) conditioned on the data up to T. The variance of the expected value of the new data set \(\mathbf{Y'}_t\) is \(var.Enewy = Z_t \hat{V}_t Z_t^\top\).
We compute the prediction interval for y', an interval that will cover the new data for alpha/2 percent of new data sets. The equation for the \(\alpha/2\) confidence interval is (qnorm(alpha/2)*sd.newy + Enewy
), where sd.newy
is the square root of the diagonal of \(var.newy\). The confidence interval for the expected value of y' is qnorm(alpha/2)*se.Enewy + Enewy
, where se.Enewy
is the square root of the diagonal of \(var.Enewy\).
type="ytT" for missing data estimation
This returns the expected value and variance of \(\mathbf{Y}_t\) (the data set you DID observe) conditioned on \(\mathbf{Y}_t=y_t\). If you have no missing data, this just returns your data set. But you have missing data, this what you want in order to estimate the values of missing data in your data set. The expected value of \(\mathbf{Y}_t|\mathbf{Y}=\mathbf{y}(1:T)\) is in ytT
in MARSShatyt
output and the variance is OtT-tcrossprod(ytT)
from the MARSShatyt
output.
The intervals reported by tidy.marssMLE
for the missing values takes into account all the information in the data, specifically the correlation with other data at time t if R is not diagonal. Do not use type="fitted.ytT"
for interpolating missing data as those are for entirely new data sets and thus will ignore relevant information if \(\mathbf{y}_t\) is multivariate, not all \(\mathbf{y}_t\) are missing, and the R matrix is not diagonal.
The standard error and confidence interval for the expected value of the missing data along with the standard deviation and prediction interval for the missing data are reported. The former uses the variance of \(E[Y(t)]\) conditioned on the data while the latter uses variance of \(Y(t)\) conditioned on the data. MARSShatyt
returns these variances and expected values. See Holmes (2012) for a discussion of the derivation of expectation and variance of Y(t) conditioned on the observed data (in the section 'Computing the expectations in the update equations').
Parameter uncertainty Currently the intervals calculations for the states and observations use the point estimates of the model parameters and thus solve the intervals for the 'known' parameters case.