Applied statistics and risk management: October 2017

Tuesday, October 31, 2017

Principal Component Analysis (of the Interest Rate Curve)

Principal Component Analysis (PCA) is method for reduction of dimensionality. It enables replacement of large number of correlated variables with smaller number of new variables – principal components, without losing of too much information.

This technique is often used to describe time series of an interest rate curve. Instead of using rates for all maturities of a curve, which are pretty much correlated, we replace them with few principal components. Typically 3 components are considered sufficient, as these should theoretically describe three main attributes of a curve – level, slope and curvature.

In the context of time series, PCA also requires stationarity of time series. This usually means, for the interest rate curves, that either differencing or calculation of log returns is prerequisite.

Having i=1,2,…n time series (for each maturity point e.g. ON,1W,2W,1M,…,30Y) each consisting of measures at historical times t_j for j=1,2,…T, we calculate log return X_i(t_j) for interest rate S_i between time t_j and t_j-1 as:

The PCA method is applied to transform variables X_i into Y_i. These new Y_i are called principal components and have two advantages: (1) they are mutually uncorrelated and (2) some of them have such low variance that they can be “easily” omitted without losing too much information (thus reducing the dimensionality), because the principal components are estimated in a particular way, which maximizes the variance of - first the Y₁, then Y₂ etc...

X and Y expressed as vectors of random variables:

and together with coefficient matrix α:

the principal components are calculated as:

what translates for the k-principal component Y_k (k=1,2,..n):

and where two particular conditions are required:

- for each k (k=1,2,..n):

and

- for each combination {i,k} where i<k (i=1,2,…n-1; k=2,3,….n) the principal components must be uncorrelated:

The calculation of the α matrix is based on the eigenvectors and eigenvalues of the X covariance matrix ∑. Each entry of the covariance matrix is defined as

The α matrix is easy to be computed as each row of the matrix α is in fact one eigenvector of the X covariance matrix ∑. At the same time, variance of each principal component is given by the related eigenvalue λ. To reduce dimensionality, only the eigenvectors which respond to the largest eigenvalues are chosen.

Reverse calculation of original values from principal components is based on inverse matrix α^-1.

For example, having the EURIBOR Curve for the first 9 months of year 2017:

library (data.table)
# Euribor January-September 2017
i=data.table(
w1=c(-0.378,-0.379,-0.379,-0.379,-0.379,-0.379,-0.379,-0.379,-0.379),
w2=c(-0.373,-0.372,-0.372,-0.372,-0.373,-0.373,-0.376,-0.376,-0.377),
m1=c(-0.371,-0.372,-0.372,-0.372,-0.373,-0.373,-0.373,-0.372,-0.372),
m2=c(-0.339,-0.341,-0.340,-0.340,-0.341,-0.342,-0.341,-0.340,-0.340),
m3=c(-0.326,-0.329,-0.329,-0.330,-0.329,-0.330,-0.330,-0.329,-0.329),
m6=c(-0.236,-0.241,-0.241,-0.246,-0.251,-0.267,-0.273,-0.272,-0.273),
m9=c(-0.152,-0.165,-0.171,-0.179,-0.179,-0.195,-0.206,-0.211,-0.218),
y1=c(-0.095,-0.106,-0.110,-0.119,-0.127,-0.149,-0.154,-0.156,-0.168)
)
matplot(t(as.matrix(i)),t="l")

# first differencing
r <- diff(as.matrix(i))
matplot(r,t="l")

# centering
c=scale(r,center=T,scale=F)

# covariance matrix
x=(t(c) %*% c) / (dim(c)[1]-1)

# factor loadings from principal analysis
pc<- princomp(r)
loadings_pc=pc$loadings[]
variance_pc=pc$sdev^2

# or alternative: factor loadings from eigenvectors
loadings_ev=eigen(x)$vectors
variance_ev=eigen(x)$values

# compare results from the 2 methods:
plot(variance_pc)
lines(variance_ev)

# plot loadings (especially first 3)
matplot(loadings_pc,t='l')
matplot(loadings_ev,t='l')
matplot(loadings_pc[,1:3],t='l')

# first three principal components (inverse differenced)
alpha=t(loadings_pc[,1:3])
y=alpha %*% t(r)
matplot(diffinv(t(y)),t='l')
matplot(t(diffinv(t(y))),t='l')

Eigenvectors and Eigenvalues

For a matrix A, if there is vector x and value λ, such that it holds:

then x is called its eigenvector and λ is called its eigenvalue.

It is said that vector x is in the same direction as Ax (which is exceptional property). The parameter λ expresses whether and how much the vector shrinks or stretches.

It follows that multiplying the eigenvector by power of λ^p is like multiplying it p times by the matrix A:

There may exist multiple eigenvectors and eigenvalues. All vectors x can be found by solving equation:

This equation has a non-zero solution only if the determinant of the matrix is zero. In according, the parameters λ can be found by solving equation:

For example the two eigenvectors and eigenvalues for the 2x2 matrix:

Additional properties of nxn matrix A are the trace (sum of diagonal elements) which is sum of eigenvalues:

and the determinant which is product of eigenvalues:

Cornish-Fisher Expansion

It is approximate method for deriving quantiles of any random variable distribution based on its cumulants. Cornish-Fisher Expansion can be used , for example, to calculate various Value At Risk (VaR) quantiles.

Cumulants κr are an alternative expression of distribution moments µr. For a cumulant κr of an order r it holds for all real t (where µr‘ denotes raw moment):

Not trivial to derive at all, but can be generally expressed by recursion:

This leads to expression from raw moments (first 4 cumulants showed):

Alternatively derived from central moments (for r>1):

The Cornish-Fischer expansion for approximate determination of quantile xq builds on the variable X mean µ, standard deviation σ and cumulants κr, with the help of quantiles of standard normal distribution N(0,1):

Monday, October 30, 2017

Moments

(Remember definition of probability density function f(x):

)

Moments are measure of probability density (in general - measure of “shape of a set of points”). The n-th moment µn of a real-valued continuous function f(x) about value c of a probability density function f(x) is defined:

The value c is typically set as the mean of a distribution, then the moments are called “central” moments. If c=0, the moment is called a “raw” moment and is marked as µn‘.

The zeroth (raw) moment is equal to 1 (total area under the probability density function f(x))

The first (raw) moment is the mean

The second (central) moment is the variance

For the higher moments, standardized variants are typically shown (divided by σ^n ) and are marked as µ ̃.

The third (standardized central) moment is skewness

The fourth (standardized central) moment is kurtosis