egcm: Simplified Engle-Granger Cointegration Model

Description

Performs the two-step Engle Granger cointegration procedure on a pair of time series, and creates an object representing the results of the analysis.

Usage

egcm(X, Y, na.action, log = FALSE, normalize = FALSE, 
  debias = TRUE, robust=FALSE, include.const=TRUE,
  i1test = egcm.default.i1test(), 
  urtest = egcm.default.urtest(), 
  p.value = egcm.default.pvalue())
is.cointegrated(E)
is.ar1(E)

Value

Returns an S3 object of class "egcm". This can then be printed or plotted. There is also a summary method.

The following is a copy of the printed output that was obtained from running the first example below:


VOO[i] =   0.9201 SPY[i] -   0.6845 + R[i], 
          (0.0005)          (0.0845)     
R[i] =  -0.0004 R[i-1] + eps[i], eps ~ N(0,  0.0779^2)
        (0.0633)
R[2013-12-31] = -0.0987 (t = -1.265)

The first line of the output shows the fit that was found. The parameters were determined to be $\beta = 0.9201$, $\alpha = -0.6845$ and $\rho = -0.0004$. The standard deviation of the sequence $\epsilon$ of innovations was found to be $0.0779$. The standard errors of $\alpha$, $\beta$ and $\rho$ were found to be $0.0845$, $0.0005$ and $0.0633$ respectively.

The third line of output shows the value of the residual as of the last observation in the series. The sign of the value $-0.0987$ indicates that VOO was relatively undervalued on this date and that the difference between the two series was $-1.265$ standard deviations from their historical mean.

The fields of the "egcm" object are as follows:

S1: the first data series (X[i])
S2: the second data series (Y[i])
residuals: the residual series (R[i])
innovations: the sequence of innovations ($\epsilon$[i])
index: the index vector for the series
i1test: the name of the test used for verifying that X and Y are integrated
urtest: the name of the test used for verifying that the residual series does not contain a unit root
pvalue: the p-value that is used for the various tests used by this model
log: Boolean, which if true indicates that S1 and S2 are logged
alpha: the computed value of $\alpha$
alpha.se: standard error of the estimate of $\alpha$
beta: the computed value of $\beta$
beta.se: standard error of the estimate of $\beta$
rho: the computed and debiased value of $\rho$
rho.raw: the value of $\rho$ determined prior to debiasing
rho.se: standard error of the estimate of $\rho$
s1.i1.stat: test statistic found when checking that S1 is integrated
s1.i1.p: p-value associated to s1.i1.stat
s2.i1.stat: test statistic found when checking that S2 is integrated
s2.i1.p: p-value associated to s2.i1.stat
r.stat: test statistic found when checking whether the residual series contains a unit root
r.p: p-value associated to r.stat
eps.ljungbox.stat: test statistic found when checking whether an AR(1) model adequately fits the residual series
eps.ljungbox.p: p-value associated to eps.ljungbox.stat
s1.dsd: standard deviation of diff(S1)
s2.dsd: standard deviation of diff(S2)
r.sd: standard deviation of residuals
eps.sd: standard deviation of the innovations $\epsilon[i]$

Arguments

X

the first time series to be considered in the cointegration test. A plain or zoo vector. Alternatively, a two-column matrix or data.frame, in which case Y should be omitted.

Y

the second time series to be considered in the cointegration test. A plain or zoo vector.

E

an object of class "egcm" returned from a previous call to egcm

na.action

a function that indicates what should happen when the data contain NAs. See lm.

log

a boolean value which if TRUE, indicates that the model should be fit to the logs of the input vectors X and Y. Default: FALSE.

normalize

a boolean value which if TRUE, indicates that each series should be normalized to start at 1. This is performed by dividing the series by its first element. Default: FALSE.

debias

a boolean value which if TRUE, indicates that the value of $rho$ that is reported should be debiased. Default: TRUE.

robust

a boolean value which if TRUE, indicates that the two-step Engle-Granger procedure should be performed using a robust linear model rather than a standard linear model. See rlm. Default: FALSE.

include.const

a boolean which if TRUE, indicates that the constant term $alpha$ should be included in the model. Otherwise, sets $alpha=0$. Default: TRUE.

i1test

a mnemonic indicating the name of the test that should be used for checking if the input series X and Y are integrated. If none is specified, then defaults to the value reported by egcm.default.i1test(). The installation default is "pp". The following tests are supported:

"adf" Augmented Dickey-Fuller test (see adf.test)
"pp" Phillips-Perron test (see pp.test)
"pgff" Pantula, Gonzales-Farias and Fuller weighted symmetric estimate (see pgff.test)
"bvr" Breitung's variance ratio (see bvr.test)

urtest

a mnemonic indicating the name of the test that should be used for checking if the residual series contains a unit root. If none is specified, then defaults to the value reported by egcm.default.urtest(). The installation default is "pp". The following tests are supported:

"adf" Augmented Dickey-Fuller test (see adf.test)
"pp" Phillips-Perron test (see pp.test)
"pgff" Pantula, Gonzales-Farias and Fuller weighted symmetric estimate (see pgff.test)
"bvr" Breitung's variance ratio (see bvr.test)
"jo-e" Johansen's eigenvalue test (see ca.jo)
"jo-t" Johansen's trace test (see ca.jo)
"ers-p" Elliott, Rothenberg and Stock point optimal test (see ur.ers)
"ers-d" Elliott, Rothenberg and Stock DF-GLS test (see ur.ers)
"sp-r" Schmidt and Phillips rho statistic (see ur.sp)
"hurst" Hurst exponent calculated using the corrected empirical method (see hurstexp)

p.value: the p-value to be used in the above tests. If none is specified, then defaults to the value reported by egcm.default.pvalue(). The installation default is 0.05.

Author

Matthew Clegg matthewcleggphd@gmail.com

Disclaimer

The software in this package is for general information purposes only. It is hoped that it will be useful, but it is provided WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. It is not intended to form the basis of any investment decision. USE AT YOUR OWN RISK!

Details

The two-step Engle Granger procedure searches for parameters $\alpha$, $\beta$, and $\rho$ that yield the best fit to the following model:

$$Y[i] = \alpha + \beta * X[i] + R[i]$$ $$R[i] = \rho * R[i-1] + \epsilon[i]$$ $$\epsilon[i] \sim N(0, \sigma^2)$$

In the first step, $alpha$ and $beta$ are found using a linear fit of X[i] with respect to Y[i]. The residual sequence R[i] is then determined. Then, in the second step, $\rho$ is determined, again using a linear fit.

Engle and Granger showed that if $X$ and $Y$ are cointegrated, then this procedure will yield consistent estimates of the parameters. However, there are several ways in which this estimation procedure can fail:

Either X or Y (or both) may already be mean-reverting. In this case, there is no point in forming the difference $Y - \beta X$. If one series is mean-reverting and the other is not, then any non-trivial linear combination will not be mean-reverting.
The residual series R[i] may not be mean-reverting. In the language of cointegration theory, it is then said to contain a unit root. In this case, there is no benefit to forming the linear combination $Y - \beta X$.
The residual series R[i] may be mean-reverting, but the relation $R[i] = \rho R[i-1] + \epsilon[i]$ may not be the right model. In other words, the residual series may not be adequately described by an auto-regressive series of order one. In this case, the parameters $\alpha$ and $\beta$ will be correct, however the specification for the residuals R[i] will not be. The user may wish to try fitting the residuals using another function, such as arima.

The egcm function checks for each of the above contingencies, using an appropriate statistical test. If one of the above conditions is found, then a warning message is displayed when the model is printed.

The p-value used in the above tests is given by the parameter p.value. This can be changed by setting the value of the parameter, or by changing the default value with egcm.set.default.pvalue. For all of the unit root tests, the p-values of the corresponding test statistics have been recomputed through simulation and a table lookup is used. The Ljung-Box test (see Box.test) is used to assess whether or not the residual series can be adequately fit with an autoregressive series of order one.

The estimates of $\alpha$ and $\beta$ are not only consistent but also unbiased. Unfortunately, the estimate obtained for $\rho$ may be biased. Therefore, a bias correction has been implemented for $\rho$. A pre-computed table of biases has been determined through simulation, and a table lookup is performed to determine the appropriate bias correction. To turn off this feature, set debias = FALSE.

The helper function is.cointegrated() takes as input an "egcm" object E. It returns TRUE if E appears to represent a valid pair of cointegrated series. In other words, it checks that both X and Y are integrated and that the residual series R is free of unit roots. The helper function is.ar1() also takes as input an "egcm" object E. It returns TRUE if the residual series R can be adequately fit by an autoregressive model of order one.

From the standpoint of securities trading, cointegration is thought to provide a useful model for pairs trading. If the price series of two securities are cointegrated, then the corresponding residual series R[i] will be mean-reverting. When the magnitude of the residual R[N] is large, a trader might establish a long position in the undervalued security and a short position in the overvalued security. With high probability, the positions will converge in value, and a profit can be collected. Numerous scholarly articles and several books have been written on pairs trading.

Data mining for cointegrated pairs is not recommended, though. As with any statistical test, the cointegration test will generate false positives. Experience shows that at least in the case of the components of the S&P 500, the number of false positives overwhelms the number of truly cointegrated series.

References

Chan, E. (2013). Algorithmic trading: winning strategies and their rationale. (Vol. 625). John Wiley & Sons.

Clegg, M. (2014). On the Persistence of Cointegration in Pairs Trading (January 28, 2014). Available at SSRN: http://ssrn.com/abstract=2491201

Ehrman, D.S. (2006). The handbook of pairs trading: strategies using equities, options, and futures. (Vol. 240). John Wiley & Sons.

Engle, R. F. and C. W. Granger. (1987) Co-integration and error correction: representation, estimation, and testing. Econometrica, 251-276.

Pfaff, B. (2008) Analysis of Integrated and Cointegrated Time Series with R. Second Edition. Springer, New York. ISBN 0-387-27960-1

Vidyamurthy, G. (2004). Pairs trading: quantitative methods and analysis. (Vol 217). Wiley.com.

Examples

Run this code

if (FALSE) {
library(quantmod)

# SPY and IVV are both ETF's that track the S&P 500.
# One would expect them to be cointegrated, and in 2013 they were.
spy2013 <- getSymbols("SPY", from = "2013-01-01",
          to = "2013-12-31",auto.assign = FALSE)$SPY.Adjusted
ivv2013 <- getSymbols("IVV", from = "2013-01-01", 
         to = "2013-12-31",auto.assign = FALSE)$IVV.Adjusted
egcm(spy2013, ivv2013)

# egcm has a plot method, which can be useful
# In this plot, it appears that there is only one price series,
# but that is because the two price series are so close to each
# other that they are indistinguishable.
plot(egcm(spy2013, ivv2013))

# The yegcm method provides a convenient interface to the quantmod
# package, which can fetch closing prices from Yahoo.  Thus, 
# the above can be simplified as follows:

e <- yegcm("SPY", "VOO", start="2013-01-01", end="2014-01-01")
print(e)
plot(e)
summary(e)

# GLD and IAU both track the price of gold.  
# They too tend to be very tightly cointegrated.
gld.iau.2013 <- yegcm("GLD", "IAU", start="2013-01-01", end="2013-12-31")
gld.iau.2013
plot(gld.iau.2013)

# Coca-cola and Pepsi are often mentioned as an
# example of a pair of securities for which pairs trading
# may be fruitful.  However, at least in 2013, they were not
# cointegrated.
ko.pep.2013 <- yegcm("KO", "PEP", start="2013-01-01", end="2013-12-31")
ko.pep.2013
plot(ko.pep.2013)

# Ford and GM seemed to be even more tightly linked.
# Yet, the degree of linkage was not high enough to pass the
# cointegration test.
f.gm.2013 <- yegcm("F","GM", start="2013-01-01", end="2013-12-31")
f.gm.2013
plot(f.gm.2013)
}

Run the code above in your browser using DataLab