Wednesday, April 12, 2017

The Johansen cointegration approach

The Johansen’s solution to cointegration is based on vector autoregression (VAR) and allows simultaneous analysis of n time series. With (n x 1) vector Xt the VAR model is defined (for simplicity neither intercept vector c, nor deterministic regressors CDt are considered):
This can be rewritten to a multivariate generalization of the Augmented Dickey-Fuller test:
This is the "transitory" form (because it includes Xt-1, as opposed to "long-run" form which includes Xt-p) therefore it holds that:
The rank r of the (n x n) matrix Π determines number of cointegration vectors (0≤r<n). If r=0 then there is no cointegration.  Matrix Π (impact matrix) can be rewritten to product of (n x r) matrix of adjustment parameters α and (r x n) matrix of cointegrating vectors β’.
For example, having two time series the VAR(1) and VECM(1) model become:
To find out r (number of cointegrating vectors), we can perform the “trace” test to test for:
H0: There are r (linearly-independent) cointegrating vectors (and n-r common stochastic trends)
H1: There are more than r cointegrating vectors
We usually start with r=0, perform the test and in case of rejection we increase r by 1 and test again, until the hypothesis is not rejected.


An example in R code (taken from Bernhard Pfaff: Tutorial - Analysis of Integrated and Cointegrated Time Series) :


set.seed(12345)
e1 <- rnorm(250, 0, 0.5)
e2 <- rnorm(250, 0, 0.5)
e3 <- rnorm(250, 0, 0.5)
u1.ar1 <- arima.sim(model = list(ar=0.75), innov = e1, n = 250)
u2.ar1 <- arima.sim(model = list(ar=0.3), innov = e2, n = 250)
y3 <- cumsum(e3)
y1 <- 0.8 * y3 + u1.ar1
y2 <- -0.3 * y3 + u2.ar1
ymax <- max(c(y1, y2, y3))
ymin <- min(c(y1, y2, y3))
plot(y1, ylab = "", xlab = "", ylim = c(ymin, ymax))
lines(y2, col = "red")
lines(y3, col = "blue")
y.mat <- data.frame(y1, y2, y3)
vecm1 <- ca.jo(y.mat, type = "eigen", spec = "transitory")
vecm2 <- ca.jo(y.mat, type = "trace", spec = "transitory")
vecm.r2 <- cajorls(vecm1, r = 2)
vecm.level <- vec2var(vecm1, r = 2)
vecm.pred <- predict(vecm.level,n.ahead = 10)
fanchart(vecm.pred)
vecm.irf <- irf(vecm.level, impulse = 'y3',response = 'y1', boot = FALSE)
vecm.fevd <- fevd(vecm.level)
vecm.norm <- normality.test(vecm.level)
vecm.arch <- arch.test(vecm.level)
vecm.serial <- serial.test(vecm.level)

The Engle-Granger cointegration approach

The two steps of Engle-Granger’s approach to cointegration, in case of two time series yt, xt, are:

1st step:
Test for integration order of both variables (as it is condition for cointegration that all variables are integrated of same order I(d)). This is achieved  (for instance) via the Augmented Dickey-Fuller test (any of the three variants can be used, according to judgment). If both variables are integrated of same order, least square regression cointegrating model can be estimated:
The residuals t are then tested for stationarity (Augmented Dickey-Fuller test without drift and without trend,  however since residuals are themselves estimates, the MacKinnon critical values have to be used). If time series of residuals is found stationary, the two time series are said to be cointegrated .


2nd step:
Estimate error correction model with residuals t (from step 1). As the estimate of β̂ is superconsistent, the residuals can be considered to be “known” and then ordinary least square applies:


An example in R code (taken from Bernhard Pfaff: Tutorial - Analysis of Integrated and Cointegrated Time Series) :




set.seed(12345)
e1 <- rnorm(250, mean = 0, sd = 0.5)
e2 <- rnorm(250, mean = 0, sd = 0.5)
u.ar3 <- arima.sim(model = list(ar = c(0.6, -0.2, 0.1)), n = 250, innov = e1)
y2 <- cumsum(e2)
y1 <- u.ar3 + 0.5*y2
ymax <- max(c(y1, y2))
ymin <- min(c(y1, y2))
layout(matrix(1:2, nrow = 2, ncol = 1))
plot(y1, xlab = "", ylab = "", ylim =c(ymin, ymax), main ="Cointegrated System")
lines(y2, col = "green")
plot(u.ar3, ylab = "", xlab = "", main = "Cointegrating Residuals")
abline(h = 0, col = "red")
lr <- lm(y1 ~ y2)
ect <- resid(lr)[1:249]
dy1 <- diff(y1)
dy2 <- diff(y2)
ecmdat <- cbind(dy1, dy2, ect)
ecm <- dynlm(dy1 ~ L(ect, 1) + L(dy1, 1)+ L(dy2, 1) , data = ecmdat)

Cointegration and error correction model

If two (or more) time series are integrated of same order (e.g. of order one I(1)), but their linear combination is integrated of lesser order (e.g. integrated of order zero I(0)), then these series are cointegrated.


In the vector notation: if yt is (n x 1) vector of time series (y1,t, y2,t, ... yn,t)' all integrated of order I(d) and β is (n x 1) cointegrating vector 1, β2, ... βn)' , then if there exists linear combination zt integrated of order I(d-b), the vector of time series is cointegrated I(d,b).


In fact, there can be more than one cointegrating vectors, then we are dealing with (n x r) matrix β. There can be 0<r<n cointegrating vectors. It follows that there are n-r common stochastic trends.



One advantage of cointegration is that the problem of “spurious regression” can be avoided in this case, as the least square regression gives unbiased and superconsistent estimates.


Another advantage is that it is a long-term model (e.g. captures long-term relationship between levels of two time series y and x) in contrary to a model fitted on differenced data which would be a short-term model (e.g. captures effect of change in x on change of y). Combination of both approaches (short- and long-term models) is known as error correction model (ECM), which is self-regulating model that allows for short-term autoregressive behavior but after some time always aligns time series into equilibrium given by cointegrating vectors.


To test for cointegration and to create an error correction model, several approaches are possible, most common are the Engle-Granger's two step estimation and Johansen‘s vector autoregressive (VAR) technique. While the Engle-Granger's approach is easier to perform, it allows only one cointegrating vector at most (one has to decide which of the series is dependent and which are idependent).