breakpoints: Dating Breaks

Description

Computation of breakpoints in regression relationships. Given a number of breaks the function computes the optimal breakpoints.

Usage

## S3 method for class 'formula':
breakpoints(formula, h = 0.15, breaks = NULL, tol = 1e-15,
    data = list(), ...)
## S3 method for class 'breakpointsfull':
breakpoints(obj, breaks = NULL, ...)
## S3 method for class 'breakpointsfull':
summary(object, breaks = NULL, sort = TRUE,
    format.times = NULL, ...)
## S3 method for class 'breakpoints':
lines(x, breaks = NULL, lty = 2, ...)

Arguments

formula

a symbolic description for the model in which breakpoints will be estimated.

minimal segment size either given as fraction relative to the sample size or as an integer giving the minimal number of observations in each segment.

breaks

integer specifying the maximal number of breaks to be calculated. By default the maximal number allowed by h is used or the number of breaks stored in the "breakpointsfull" object.

tol

tolerance when solve is used.

data

an optional data frame containing the variables in the model. By default the variables are taken from the environment which breakpoints is called from.

...

currently not used.

obj, object

an object of class "breakpointsfull".

sort

logical. If set to TRUE summary tries to match the breakpoints from partitions with different numbers of breaks.

format.times

logical. If set to TRUE a vector of strings with the formatted breakdates. See breakdates for more information.

an object of class "breakpoints".

lty

line type.

Value

An object of class "breakpoints" is a list with the following elements: [object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

If applied to a formula breakpoints returns an object of class "breakpointsfull" (which inherits from "breakpoints"), that contains some additional (or slightly different) elements such as: [object Object],[object Object],[object Object]

Details

All procedures in this package are concerned with testing or assessing deviations from stability in the classical linear regression model

$$y_i = x_i^\top \beta + u_i$$

In many applications it is reasonable to assume that there are $m$ breakpoints, where the coefficients shift from one stable regression relationship to a different one. Thus, there are $m+1$ segments in which the regression coefficients are constant, and the model can be rewritten as

$$y_i = x_i^\top \beta_j + u_i \qquad (i = i_{j-1} + 1, \dots, i_j, \quad j = 1, \dots, m+1)$$

where $j$ denotes the segment index. In practice the breakpoints $i_j$ are rarely given exogenously, but have to be estimated. This is what breakpoints does by minimizing the residual sum of squares (RSS) of the equation above.

The foundation for estimating breaks in time series regression models was given by Bai (1994) and was extended to multiple breaks by Bai (1997ab) and Bai & Perron (1998). breakpoints implements the algorithm described in Bai & Perron (2002) for simultanous estimation of multiple breakpoints. The distribution function used for the confidence intervals for the breakpoints is given in Bai (1997b). The ideas behind this implementation are described in Zeileis et al. (2002).

The algorithm for computing the optimal breakpoints given the number of breaks is based on a dynamic programming approach. The underlying idea is that of the Bellman principle. The main computational effort is to compute a triangular RSS matrix, which gives the residual sum of squares for a segment starting at observation $i$ and ending at $i'$ with $i$ < $i'$.

Given a formula breakpoints computes an object of class "breakpointsfull" which inherits from "breakpoints". This contains in particular the triangular RSS matrix and functions to extract an optimal segmentation. A summary of this object will give the breakpoints (and associated) breakdates for all segmentations up to the maximal number of breaks together with the associated RSS and BIC. These will be plotted if plot is applied and thus visualize the minimum BIC estimator of the number of breakpoints. From an object of class "breakpointsfull" an arbitrary number of breaks (admissable by the minimum segment size h) can be extracted by another application of breakpoints, returning an object of class "breakpoints". This contains only the breakpoints for the specified number of breaks and some model properties (number of observations, regressors, time series properties and the associated RSS) but not the triangular RSS matrix and related extractor functions. The set of breakpoints which is associated by default with a "breakpointsfull" object is the minimum BIC partition.

Breakpoints are the number of observations that are the last in one segment, it is also possible to compute the corresponding breakdates which are the breakpoints on the underlying time scale. The breakdates can be formatted which enhances readability in particular for quarterly or monthly time series. For example the breakdate 2002.75 of a monthly time series will be formatted to "2002(10)". See breakdates for more details.

Confidence intervals for the breakpoints can be computed from a "breakpointsfull" object using the method of confint. The breakdates corresponding to the breakpoints can again be computed by breakdates. The breakpoints and their confidence intervals can be visualized by lines.

The log likelihood as well as some information criteria can be computed using the methods for the logLik and AIC. As for linear models the log likelihood is computed on a normal model and the degrees of freedom are the number of regression coefficients multiplied by the number of segements plus the number of estimated breakpoints plus 1 for the error variance. More details can be found on the help page of the method logLik.breakpoints.

As the maximum of a sequence of F statistics is equivalent to the minimum OLS estimator of the breakpoint in a 2-segment partition it can be extracted by breakpoints from an object of class "Fstats" as computed by Fstats. However, this cannot be used to extract a larger number of breakpoints.

For illustration see the commented examples below and Zeileis et al. (2002).

References

Bai J. (1994), Least Squares Estimation of a Shift in Linear Processes, Journal of Time Series Analysis, 15, 453-472.

Bai J. (1997a), Estimating Multiple Breaks One at a Time, Econometric Theory, 13, 315-352.

Bai J. (1997b), Estimation of a Change Point in Multiple Regression Models, Review of Economics and Statistics, 79, 551-563.

Bai J., Perron P. (1998), Estimating and Testing Linear Models With Multiple Structural Changes, Econometrica, 66, 47-78.

Bai J., Perron P. (2002), Computation and Analysis of Multiple Structural Change Models, Journal of Applied Econometrics, forthcoming.

Zeileis A., Kleiber C., Kr�mer W., Hornik K. (2002), Testind and Dating of Structural Changes in Practice, Technical Report 39/02, SFB "Reduction of Complexity for Multivariate Data Structures", Universit�t Dortmund, http://www.statistik.uni-dortmund.de/sfb475/berichte/tr39-02.ps.

Examples

Run this code

require(ts)

## Nile data with one breakpoint: the annual flows drop in 1898
## because the first Ashwan dam was built
data(Nile)
plot(Nile)

## F statistics indicate one breakpoint
fs.nile <- Fstats(Nile ~ 1)
plot(fs.nile)
breakpoints(fs.nile)
lines(breakpoints(fs.nile))

## or
bp.nile <- breakpoints(Nile ~ 1)
summary(bp.nile)

## the BIC also chooses one breakpoint
plot(bp.nile)
breakpoints(bp.nile)

## fit null hypothesis model and model with 1 breakpoint
fm0 <- lm(Nile ~ 1)
fm1 <- lm(Nile ~ breakfactor(bp.nile, breaks = 1))
plot(Nile)
lines(fitted(fm0), col = 3)
lines(fitted(fm1), col = 4)
lines(bp.nile)

## confidence interval
ci.nile <- confint(bp.nile)
ci.nile
lines(ci.nile)


## UK Seatbelt data: a SARIMA(1,0,0)(1,0,0)_12 model
## (fitted by OLS) is used and reveals (at least) two
## breakpoints - one in 1973 associated with the oil crisis and
## one in 1983 due to the introduction of compulsory
## wearing of seatbelts in the UK.
data(UKDriverDeaths)
seatbelt <- log10(UKDriverDeaths)
seatbelt <- cbind(seatbelt, lag(seatbelt, k = -1), lag(seatbelt, k = -12))
colnames(seatbelt) <- c("y", "ylag1", "ylag12")
seatbelt <- window(seatbelt, start = c(1970, 1), end = c(1984,12))
plot(seatbelt[,"y"], ylab = expression(log[10](casualties)))

## testing
re.seat <- efp(y ~ ylag1 + ylag12, data = seatbelt, type = "RE")
plot(re.seat)

## dating
bp.seat <- breakpoints(y ~ ylag1 + ylag12, data = seatbelt, h = 0.1)
summary(bp.seat)
lines(bp.seat, breaks = 2)

## minimum BIC partition
plot(bp.seat)
breakpoints(bp.seat)
## the BIC would choose 0 breakpoints although the RE and supF test
## clearly reject the hypothesis of structural stability. Bai &
## Perron (2002) report that the BIC has problems in dynamic regressions.
## due to the shape of the RE process of the F statistics choose two
## breakpoints and fit corresponding models
bp.seat2 <- breakpoints(bp.seat, breaks = 2)
fm0 <- lm(y ~ ylag1 + ylag12, data = seatbelt)
fm1 <- lm(y ~ breakfactor(bp.seat2)/(ylag1 + ylag12) - 1, data = seatbelt)

## plot
plot(seatbelt[,"y"], ylab = expression(log[10](casualties)))
time.seat <- as.vector(time(seatbelt))
lines(time.seat, fitted(fm0), col = 3)
lines(time.seat, fitted(fm1), col = 4)
lines(bp.seat2)

## confidence intervals
ci.seat2 <- confint(bp.seat, breaks = 2)
ci.seat2
lines(ci.seat2)

Run the code above in your browser using DataLab