Learn R Programming

tsoutliers (version 0.3)

remove.outliers: Stage II of the Procedure: Remove Outliers

Description

This functions tests for the significance of a given set of outliers in a time series model that is fitted including the outliers as regressor variables.

Usage

remove.outliers(x, y, cval = NULL, 
  method = c("en-masse", "bottom-up", "linear-regression"), 
  delta = 0.7, n.start = 50, tsmethod.call = NULL, 
  fdiff = NULL, logfile = NULL)

Arguments

x
a list. The output returned by locate.outliers.oloop.
y
a time series.
cval
a numeric. The critical value to determine the significance of each type of outlier.
method
a character. The method to remove outliers. See details.
delta
a numeric. Parameter of the temporary change type of outlier.
n.start
a numeric. The number of warming observations added to the input passed to the Kalman filter. Only for tsmethod = "stsm".
tsmethod.call
an optional call object. The call to the function used to fit the time series model.
fdiff
an optional function. Differencing filter to be applied on the original data. Used if method="linear-regression".
logfile
a character or NULL. It is the path to the file where tracking information is printed. Ignored if NULL.

Value

  • A list containing the following elements: xreg, the variables used as regressors; xregcoefs, the coefficients of the outlier regressors; xregtstats, the $t$-statistics of the outlier regressors;; iter, the number of iterations used by method "en-masse"; fit, the fitted model; outliers, the set of outliers after removing those that were not significant.

Details

In the regressions involved in this function, the variables included as regressors stand for the effects of the outliers on the data. These variables are the output returned by outliers.effects not by outliers.regressors, which returns the regressors used in the auxiliar regression where outliers are located (see second equation defined in locate.outliers).

The outliers are defined in input x. If there are regressor variables in tsmethod.call$xreg they are considered as other regressor variables that are included in the regression to test for the significance of outliers.

Given a set of potential outliers detected by locate.outliers and locate.outliers.oloop, three methods are considered to remove those outliers that are not significant after fitting again the time series model:

  • "en-masse": The complete set of outliers is included as regressor variables and the model is fitted again. Those outliers that turn out to be not significant for the critical valuecvalare removed. The procedure is iterated until all the outliers are significant in the final set of outliers. %
  • "bottom-up": First the, the outlier with larger$t$-statistic is included in the model. If it is significant the presence of the outlier is confirmed. Otherwise it is removed. Then, the next outlier with larger$t$-statistic is included along with the first outlier and tested for significance. If after including a new outlier, e.g. the$i$-th outlier, the outliers that have been confirmed so far significant become not significant, then the$i$-th outlier is removed regardless of the value of its$t$-statistic. %
  • "linear-regression": The original series and the outlier regressors are differenced by means of the functionfdiff. Then a linear regression of the the differenced data on the differenced regressors is performed. Those outliers that are not significant at the 5\% significant value are removed. The p-value from the linear regression is used, not the critical valuecval, which is omitted with this method.

The "bottom-up" may be preferred to "bottom-up" when there are are several outliers since it may be hard to fit an ARIMA model with many regressor variables.

The method "linear-regression" is an experimental version intended to be used with tsmethod = "stsm", where fitting the model with external regressors seems to be harder. In this method the critical value cval is not used, the p-value of the t-statistics is used instead for the $5%$ significance level.

References

Chen, C. and Liu, Lon-Mu (1993). Joint Estimation of Model Parameters and Outlier Effects in Time Series. Journal of the American Statistical Association, 88(421), pp. 284-297.

Gómez, V. and Maravall, A. (1996). Programs TRAMO and SEATS. Instructions for the user. Banco de España, Servicio de Estudios. Working paper number 9628. http://www.bde.es/f/webbde/SES/Secciones/Publicaciones/PublicacionesSeriadas/DocumentosTrabajo/96/Fich/dt9628e.pdf

See Also

locate.outliers, tsoutliers.

Examples

Run this code
data("hicp")
y <- log(hicp[["011600"]])
fit <- arima(y, order = c(1, 1, 0), seasonal = list(order = c(2, 0, 2)))
# initial set of outliers
res <- locate.outliers.oloop(y, fit, types = c("AO", "LS", "TC"))
res$outliers
# given the model fitted above, the effect on the data of some of 
# the outliers is not significant (method = "en-masse")
remove.outliers(res, y, method = "en-masse", 
  tsmethod.call = fit$call)$outliers
# in this case, using method = "bottom-up" the firt four 
# outliers with higher t-statistic are kept
remove.outliers(res, y, method = "bottom-up", 
  tsmethod.call = fit$call)$outliers

Run the code above in your browser using DataLab