Tuesday, October 31, 2017

Principal Component Analysis (of the Interest Rate Curve)

Principal Component Analysis (PCA) is method for reduction of dimensionality. It enables replacement of large number of correlated variables with smaller number of new variables – principal components, without losing of too much information.

This technique is often used to describe time series of an interest rate curve. Instead of using rates for all maturities of a curve, which are pretty much correlated, we replace them with few principal components. Typically 3 components are considered sufficient, as these should theoretically describe three main attributes of a curve – level, slope and curvature.

In the context of time series, PCA also requires stationarity of time series. This usually means, for the interest rate curves, that either differencing or calculation of log returns is prerequisite.

Having i=1,2,…n time series (for each maturity point e.g. ON,1W,2W,1M,…,30Y) each consisting of measures at historical times tj for j=1,2,…T, we calculate log return Xi(tj) for interest rate Si between time tj and tj-1 as:
The PCA method is applied to transform variables Xi into Yi. These new Yi are called principal components and have two advantages: (1) they are mutually uncorrelated and (2) some of them have such low variance that they can be “easily” omitted without losing too much information (thus reducing the dimensionality), because the principal components are estimated in a particular way, which maximizes the variance of - first the Y1, then Y2 etc...
X and Y expressed as vectors of random variables:
and together with coefficient matrix α:
the principal components are calculated as:
what translates for the k-principal component Yk (k=1,2,..n):
and where two particular conditions are required:
- for each k (k=1,2,..n):
and
- for each combination {i,k} where i<k (i=1,2,…n-1; k=2,3,….n) the principal components must be uncorrelated:
The calculation of the α matrix is based on the eigenvectors and eigenvalues of the X covariance matrix ∑. Each entry of the covariance matrix is defined as
The α matrix is easy to be computed as each row of the matrix α is in fact one eigenvector of the X covariance matrix . At the same time, variance of each principal component is given by the related eigenvalue λ. To reduce dimensionality, only the eigenvectors which respond to the largest eigenvalues are chosen.

Reverse calculation of original values from principal components is based on inverse matrix α-1.


For example, having the EURIBOR Curve for the first 9 months of year 2017:

library (data.table)
# Euribor January-September 2017
i=data.table(
  w1=c(-0.378,-0.379,-0.379,-0.379,-0.379,-0.379,-0.379,-0.379,-0.379),
  w2=c(-0.373,-0.372,-0.372,-0.372,-0.373,-0.373,-0.376,-0.376,-0.377),
  m1=c(-0.371,-0.372,-0.372,-0.372,-0.373,-0.373,-0.373,-0.372,-0.372),
  m2=c(-0.339,-0.341,-0.340,-0.340,-0.341,-0.342,-0.341,-0.340,-0.340),
  m3=c(-0.326,-0.329,-0.329,-0.330,-0.329,-0.330,-0.330,-0.329,-0.329),
  m6=c(-0.236,-0.241,-0.241,-0.246,-0.251,-0.267,-0.273,-0.272,-0.273),
  m9=c(-0.152,-0.165,-0.171,-0.179,-0.179,-0.195,-0.206,-0.211,-0.218),
  y1=c(-0.095,-0.106,-0.110,-0.119,-0.127,-0.149,-0.154,-0.156,-0.168)

matplot(t(as.matrix(i)),t="l")

# first differencing
r <- diff(as.matrix(i))
matplot(r,t="l")

# centering
c=scale(r,center=T,scale=F)

# covariance matrix
x=(t(c) %*% c) / (dim(c)[1]-1)

# factor loadings from principal analysis
pc<- princomp(r)
loadings_pc=pc$loadings[]
variance_pc=pc$sdev^2

# or alternative: factor loadings from eigenvectors
loadings_ev=eigen(x)$vectors
variance_ev=eigen(x)$values

# compare results from the 2 methods:
plot(variance_pc)
lines(variance_ev)

# plot loadings (especially first 3)
matplot(loadings_pc,t='l')
matplot(loadings_ev,t='l')
matplot(loadings_pc[,1:3],t='l')

# first three principal components (inverse differenced)
alpha=t(loadings_pc[,1:3])
y=alpha %*% t(r)
matplot(diffinv(t(y)),t='l')
matplot(t(diffinv(t(y))),t='l')  

No comments:

Post a Comment