oneStepPredict: Calculate one-step-ahead (OSA) residuals for a latent variable model.

Description

Calculate one-step-ahead (OSA) residuals for a latent variable model. (Beta version; may change without notice)

Usage

oneStepPredict(obj, observation.name = NULL, data.term.indicator = NULL,
  method = c("oneStepGaussianOffMode", "fullGaussian", "oneStepGeneric",
  "oneStepGaussian", "cdf"), subset = NULL, conditional = NULL,
  discrete = NULL, discreteSupport = NULL, range = c(-Inf, Inf),
  seed = 123, parallel = FALSE, trace = TRUE, ...)

Arguments

obj

Output from MakeADFun.

observation.name

Character naming the observation in the template.

data.term.indicator

Character naming an indicator data variable in the template (not required by all methods - see details).

method

Method to calculate OSA (see details).

subset

Index vector of observations that will be added one by one during OSA. By default 1:length(observations) (with conditional subtracted).

conditional

Index vector of observations that are fixed during OSA. By default the empty set.

discrete

Are observations discrete? (assumed FALSE by default)

discreteSupport

Possible outcomes of discrete distribution (method="oneStepGeneric" only).

range

Possible range of the observations.

seed

Randomization seed (discrete case only). If NULL the RNG seed is untouched by this routine.

parallel

Run in parallel using the parallel package?

trace

Trace progress?

...

Control parameters for OSA method

Value

data.frame with OSA standardized residuals in column residual. Depending on the method the output may also include OSA expected observation in column mean.

Details

Given a TMB latent variable model this function calculates OSA standardized residuals that can be used for goodness-of-fit assessment. The approach is based on a factorization of the joint distribution of the observations $X_1,...,X_n$ into successive conditional distributions. Denote by $$F_n(x_n) = P(X_n \leq x_n | X_1 = x_1,...,X_{n-1}=x_{n-1} )$$ the one-step-ahead CDF, and by $$p_n(x_n) = P(X_n = x_n | X_1 = x_1,...,X_{n-1}=x_{n-1} )$$ the corresponding point probabilities (zero for continuous distributions). In case of continuous observations the sequence $$\Phi^{-1}(F_1(X_1))\:,...,\:\Phi^{-1}(F_n(X_n))$$ will be iid standard normal. These are referred to as the OSA residuals. In case of discrete observations draw (unit) uniform variables $U_1,...,U_n$ and construct the randomized OSA residuals $$\Phi^{-1}(F_1(X_1)-U_1 p_1(X_1))\:,...,\:\Phi^{-1}(F_n(X_n)-U_n p_n(X_n))$$ These are also iid standard normal.

The user must specify one of the following methods to calculate the residuals:

method="fullGaussian"

This method assumes that the joint distribution of data and random effects is Gaussian (or well approximated by a Gaussian). It does not require any changes to the user template. However, if used in conjunction with subset and/or conditional a data.term.indicator is required - see the next method.

method="oneStepGeneric"

This method calculates the one-step conditional probability density as a ratio of Laplace approximations. The approximation is integrated (and re-normalized for improved accuracy) using 1D numerical quadrature to obtain the one-step CDF evaluated at each data point. The method works in the continuous case as well as the discrete case (discrete=TRUE).

It requires a specification of a data.term.indicator explained in the following. Suppose the template for the observations given the random effects ($u$) looks like

    DATA_VECTOR(x);
    ...
    nll -= dnorm(x(i), u(i), sd(i), true);
    ...

Then this template can be augmented with a data.term.indicator = "keep" by changing the template to

    DATA_VECTOR(x);
    DATA_VECTOR_INDICATOR(keep, x);
    ...
    nll -= keep(i) * dnorm(x(i), u(i), sd(i), true);
    ...

The new data vector (keep) need not be passed from R. It automatically becomes a copy of x filled with ones.

method="oneStepGaussian"

This is a special case of the generic method where the one step conditional distribution is approximated by a Gaussian (and can therefore be handled more efficiently).

method="oneStepGaussianOffMode"

This is an approximation of the "oneStepGaussian" method that avoids locating the mode of the one-step conditional density.

method="cdf"

The generic method can be slow due to the many function evaluations used during the 1D integration (or summation in the discrete case). The present method can speed up this process but requires more changes to the user template. The above template must be expanded with information about how to calculate the negative log of the lower and upper CDF:

    DATA_VECTOR(x);
    DATA_VECTOR_INDICATOR(keep, x);
    ...
    nll -= keep(i) * dnorm(x(i), u(i), 0.0, true);
    nll -= keep.cdf_lower(i) * log( pnorm(x(i), u(i), 0.0) );
    nll -= keep.cdf_upper(i) * log( 1.0 - pnorm(x(i), u(i), 0.0) );
    ...

The specialized members keep.cdf_lower and keep.cdf_upper automatically become copies of x filled with zeros.

Examples

Run this code

# NOT RUN {
######################## Gaussian case
runExample("simple")
osa.simple <- oneStepPredict(obj, observation.name = "x", method="fullGaussian")
qqnorm(osa.simple$residual); abline(0,1)

# }
# NOT RUN {
######################## Poisson case (First 100 observations)
runExample("ar1xar1")
osa.ar1xar1 <- oneStepPredict(obj, "N", "keep", method="cdf", discrete=TRUE, subset=1:100)
qqnorm(osa.ar1xar1$residual); abline(0,1)
# }

Run the code above in your browser using DataLab