**Principal Component Analysis (PCA) is method for reduction of dimensionality**. It enables replacement of large number of correlated variables with smaller number of new variables – principal components, without losing of too much information.

This technique
is

**often used to describe time series of an interest rate curve**. Instead of using rates for all maturities of a curve, which are pretty much correlated, we replace them with few principal components.**Typically 3 components are considered sufficient, as these should theoretically describe three main attributes of a curve – level, slope and curvature**.
In the
context of time series, PCA also requires stationarity of time series. This
usually means, for the interest rate curves, that either differencing or calculation
of log returns is prerequisite.

Having

*i=1,2,…n*time series (for each maturity point e.g. ON,1W,2W,1M,…,30Y) each consisting of measures at historical times*t*for_{j}*j=1,2,…T*, we calculate**log return**as:*X*for interest rate_{i}(t_{j})*S*between time_{i}*t*and_{j}*t*_{j-1}
The PCA method is
applied to transform variables

*X*into_{i}*Y*. These_{i}**new**without losing too much information (thus reducing the dimensionality), because the principal components are estimated in a particular way, which maximizes the variance of - first the*Y*are called principal components and have two advantages: (1) they are mutually uncorrelated and (2) some of them have such low variance that they can be “easily” omitted_{i}*Y*, then_{1}*Y*etc..._{2}**and**

*X***expressed as vectors of random variables:**

*Y*
and together with
coefficient matrix

**:***α*
the principal
components are calculated as:

what translates
for the

*k*-principal component*Y*(_{k}*k=1,2,..n*):
and where two particular
conditions are required:

- for each

*k*(*k=1,2,..n*):
and

- for each
combination

*{i,k}*where*i<k*(*i=1,2,…n-1; k=2,3,….n*) the principal components must be uncorrelated:**The calculation of the**. Each entry of the covariance matrix is defined as

*α*matrix is based on the eigenvectors and eigenvalues of the*X*covariance matrix ∑
The

**matrix is easy to be computed as each row of the matrix***α***is in fact one eigenvector of the***α***covariance matrix***X***∑**. At the same time, variance of each principal component is given by the related eigenvalue λ. To reduce dimensionality, only the eigenvectors which respond to the largest eigenvalues are chosen.
Reverse
calculation of original values from principal components is based on inverse
matrix

**.***α*^{-1}

For example,
having the EURIBOR Curve for the first 9 months of year 2017:

library (data.table)

# Euribor January-September 2017

i=data.table(

w1=c(-0.378,-0.379,-0.379,-0.379,-0.379,-0.379,-0.379,-0.379,-0.379),

w2=c(-0.373,-0.372,-0.372,-0.372,-0.373,-0.373,-0.376,-0.376,-0.377),

m1=c(-0.371,-0.372,-0.372,-0.372,-0.373,-0.373,-0.373,-0.372,-0.372),

m2=c(-0.339,-0.341,-0.340,-0.340,-0.341,-0.342,-0.341,-0.340,-0.340),

m3=c(-0.326,-0.329,-0.329,-0.330,-0.329,-0.330,-0.330,-0.329,-0.329),

m6=c(-0.236,-0.241,-0.241,-0.246,-0.251,-0.267,-0.273,-0.272,-0.273),

m9=c(-0.152,-0.165,-0.171,-0.179,-0.179,-0.195,-0.206,-0.211,-0.218),

y1=c(-0.095,-0.106,-0.110,-0.119,-0.127,-0.149,-0.154,-0.156,-0.168)

)

matplot(t(as.matrix(i)),t="l")

# first differencing

r <- diff(as.matrix(i))

matplot(r,t="l")

# centering

c=scale(r,center=T,scale=F)

# covariance matrix

x=(t(c) %*% c) / (dim(c)[1]-1)

# factor loadings from principal analysis

pc<- princomp(r)

loadings_pc=pc$loadings[]

variance_pc=pc$sdev^2

# or alternative: factor loadings from eigenvectors

loadings_ev=eigen(x)$vectors

variance_ev=eigen(x)$values

# compare results from the 2 methods:

plot(variance_pc)

lines(variance_ev)

# plot loadings (especially first 3)

matplot(loadings_pc,t='l')

matplot(loadings_ev,t='l')

matplot(loadings_pc[,1:3],t='l')

# first three principal components (inverse differenced)

alpha=t(loadings_pc[,1:3])

y=alpha %*% t(r)

matplot(diffinv(t(y)),t='l')

matplot(t(diffinv(t(y))),t='l')

library (data.table)

# Euribor January-September 2017

i=data.table(

w1=c(-0.378,-0.379,-0.379,-0.379,-0.379,-0.379,-0.379,-0.379,-0.379),

w2=c(-0.373,-0.372,-0.372,-0.372,-0.373,-0.373,-0.376,-0.376,-0.377),

m1=c(-0.371,-0.372,-0.372,-0.372,-0.373,-0.373,-0.373,-0.372,-0.372),

m2=c(-0.339,-0.341,-0.340,-0.340,-0.341,-0.342,-0.341,-0.340,-0.340),

m3=c(-0.326,-0.329,-0.329,-0.330,-0.329,-0.330,-0.330,-0.329,-0.329),

m6=c(-0.236,-0.241,-0.241,-0.246,-0.251,-0.267,-0.273,-0.272,-0.273),

m9=c(-0.152,-0.165,-0.171,-0.179,-0.179,-0.195,-0.206,-0.211,-0.218),

y1=c(-0.095,-0.106,-0.110,-0.119,-0.127,-0.149,-0.154,-0.156,-0.168)

)

matplot(t(as.matrix(i)),t="l")

# first differencing

r <- diff(as.matrix(i))

matplot(r,t="l")

# centering

c=scale(r,center=T,scale=F)

# covariance matrix

x=(t(c) %*% c) / (dim(c)[1]-1)

# factor loadings from principal analysis

pc<- princomp(r)

loadings_pc=pc$loadings[]

variance_pc=pc$sdev^2

# or alternative: factor loadings from eigenvectors

loadings_ev=eigen(x)$vectors

variance_ev=eigen(x)$values

# compare results from the 2 methods:

plot(variance_pc)

lines(variance_ev)

# plot loadings (especially first 3)

matplot(loadings_pc,t='l')

matplot(loadings_ev,t='l')

matplot(loadings_pc[,1:3],t='l')

# first three principal components (inverse differenced)

alpha=t(loadings_pc[,1:3])

y=alpha %*% t(r)

matplot(diffinv(t(y)),t='l')

matplot(t(diffinv(t(y))),t='l')