Tuesday, October 31, 2017

Principal Component Analysis (of the Interest Rate Curve)

Principal Component Analysis (PCA) is method for reduction of dimensionality. It enables replacement of large number of correlated variables with smaller number of new variables – principal components, without losing of too much information.

This technique is often used to describe time series of an interest rate curve. Instead of using rates for all maturities of a curve, which are pretty much correlated, we replace them with few principal components. Typically 3 components are considered sufficient, as these should theoretically describe three main attributes of a curve – level, slope and curvature.

In the context of time series, PCA also requires stationarity of time series. This usually means, for the interest rate curves, that either differencing or calculation of log returns is prerequisite.

Having i=1,2,…n time series (for each maturity point e.g. ON,1W,2W,1M,…,30Y) each consisting of measures at historical times tj for j=1,2,…T, we calculate log return Xi(tj) for interest rate Si between time tj and tj-1 as:
The PCA method is applied to transform variables Xi into Yi. These new Yi are called principal components and have two advantages: (1) they are mutually uncorrelated and (2) some of them have such low variance that they can be “easily” omitted without losing too much information (thus reducing the dimensionality), because the principal components are estimated in a particular way, which maximizes the variance of - first the Y1, then Y2 etc...
X and Y expressed as vectors of random variables:
and together with coefficient matrix α:
the principal components are calculated as:
what translates for the k-principal component Yk (k=1,2,..n):
and where two particular conditions are required:
- for each k (k=1,2,..n):
and
- for each combination {i,k} where i<k (i=1,2,…n-1; k=2,3,….n) the principal components must be uncorrelated:
The calculation of the α matrix is based on the eigenvectors and eigenvalues of the X covariance matrix ∑. Each entry of the covariance matrix is defined as
The α matrix is easy to be computed as each row of the matrix α is in fact one eigenvector of the X covariance matrix . At the same time, variance of each principal component is given by the related eigenvalue λ. To reduce dimensionality, only the eigenvectors which respond to the largest eigenvalues are chosen.

Reverse calculation of original values from principal components is based on inverse matrix α-1.


For example, having the EURIBOR Curve for the first 9 months of year 2017:

library (data.table)
# Euribor January-September 2017
i=data.table(
  w1=c(-0.378,-0.379,-0.379,-0.379,-0.379,-0.379,-0.379,-0.379,-0.379),
  w2=c(-0.373,-0.372,-0.372,-0.372,-0.373,-0.373,-0.376,-0.376,-0.377),
  m1=c(-0.371,-0.372,-0.372,-0.372,-0.373,-0.373,-0.373,-0.372,-0.372),
  m2=c(-0.339,-0.341,-0.340,-0.340,-0.341,-0.342,-0.341,-0.340,-0.340),
  m3=c(-0.326,-0.329,-0.329,-0.330,-0.329,-0.330,-0.330,-0.329,-0.329),
  m6=c(-0.236,-0.241,-0.241,-0.246,-0.251,-0.267,-0.273,-0.272,-0.273),
  m9=c(-0.152,-0.165,-0.171,-0.179,-0.179,-0.195,-0.206,-0.211,-0.218),
  y1=c(-0.095,-0.106,-0.110,-0.119,-0.127,-0.149,-0.154,-0.156,-0.168)

matplot(t(as.matrix(i)),t="l")

# first differencing
r <- diff(as.matrix(i))
matplot(r,t="l")

# centering
c=scale(r,center=T,scale=F)

# covariance matrix
x=(t(c) %*% c) / (dim(c)[1]-1)

# factor loadings from principal analysis
pc<- princomp(r)
loadings_pc=pc$loadings[]
variance_pc=pc$sdev^2

# or alternative: factor loadings from eigenvectors
loadings_ev=eigen(x)$vectors
variance_ev=eigen(x)$values

# compare results from the 2 methods:
plot(variance_pc)
lines(variance_ev)

# plot loadings (especially first 3)
matplot(loadings_pc,t='l')
matplot(loadings_ev,t='l')
matplot(loadings_pc[,1:3],t='l')

# first three principal components (inverse differenced)
alpha=t(loadings_pc[,1:3])
y=alpha %*% t(r)
matplot(diffinv(t(y)),t='l')
matplot(t(diffinv(t(y))),t='l')  

Eigenvectors and Eigenvalues

For a matrix A, if there is vector x and value λ, such that it holds:
then x is called its eigenvector and λ is called its eigenvalue.

It is said that vector x is in the same direction as Ax (which is exceptional property). The parameter λ expresses whether and how much the vector shrinks or stretches.

It follows that multiplying the eigenvector by power of λp is like multiplying it p times by the matrix A:
There may exist multiple eigenvectors and eigenvalues. All vectors x can be found by solving equation:
This equation has a non-zero solution only if the determinant of the matrix is zero. In according, the parameters λ can be found by solving equation:

For example the two eigenvectors and eigenvalues for the 2x2 matrix:
Additional properties of nxn matrix A are the trace (sum of diagonal elements) which is sum of eigenvalues:
and the determinant which is product of eigenvalues:

Cornish-Fisher​ ​Expansion

It​ ​is​ ​​approximate​ ​method​ ​for​ ​deriving​ ​quantiles​ ​of​ ​any​ ​random​ ​variable​ ​distribution​ ​based​ ​on​ ​its cumulants.​ ​Cornish-Fisher​ ​Expansion​ ​can​ ​be​ ​used​ , for example, to​ ​calculate​ ​various​ ​Value​ ​At​ ​Risk​ ​(VaR)​ ​quantiles.

Cumulants​ ​κ​r​​ ​are​ ​an​ ​alternative​ ​expression​ ​of​ ​distribution​ ​moments​ ​µ​r.​ ​For​ ​a​ ​cumulant​ ​κ​r of​ ​an order​ ​​r ​​it​ ​holds​ ​for​ ​all​ ​real​ ​​t ​(where​ ​µ​r​​ ​denotes​ ​raw​ ​moment):
Not​ ​trivial​ ​to​ ​derive​ ​at​ ​all,​ ​but​ ​can​ ​be​ ​generally​ ​expressed​ ​by​ ​recursion:
This​ ​leads​ ​to​ ​expression​ ​from​ ​raw​ ​moments​ ​(first​ ​4​ ​cumulants​ ​showed):
Alternatively​ ​derived​ ​from​ ​central​ ​moments​ ​(for​ ​r>1):
The​ ​Cornish-Fischer​ ​expansion​ ​for​ ​approximate​ ​determination​ ​of​ ​quantile​ ​x​q​​ ​builds​ ​on​ ​the​ ​variable​ ​X mean​ ​µ,​ ​standard​ ​deviation​ ​σ​ ​and​ ​cumulants​ ​κ​r​,​ ​with​ ​the​ ​help​ ​of​ ​quantiles​ ​of​ ​standard​ ​normal distribution​ ​N(0,1):

Monday, October 30, 2017

Moments

(Remember definition of probability density function f(x):
)

Moments​ ​are​ ​measure​ ​of​ ​​probability​ ​density​ (in general - measure of “shape of a set of points”). The​ ​n-th​ ​moment​ ​µ​n​ ​ ​of​ ​a​ ​real-valued​ ​continuous​ ​function​ ​f(x)​ ​about​ ​value​ ​c​ of a probability density function f(x) is defined:
The value c is typically set as the mean of a distribution, then the moments are called “central” moments. If c=0, the moment is called a “raw” moment and is marked as µn.

The​ ​zeroth​ ​(raw)​ ​moment​ ​is​ ​equal​ ​to​ ​1​ (total area under the probability density function f(x))
The​ ​first​ ​(raw)​ ​moment​ ​is​ ​the​ ​mean
The​ ​second​ ​(central)​ ​moment​ ​is​ ​the​ ​variance
For the higher moments, standardized variants are typically shown (divided by σ^n ) and are marked as µ ̃.

The​ ​third​ ​(standardized​ ​central)​ ​moment​ ​is​ ​skewness
The​ ​fourth​ ​(standardized​ ​central)​ ​moment​ ​is​ ​kurtosis

Thursday, May 11, 2017

Conditional probability, Bayes Theorem (and how it’s derived)

Given two events A and B, with their respective probabilities P(A) and P(B), the conditional probability of A given B (the probability of event A with the knowledge that event B occurred) is defined as:

The intuition behind is: We already know that event B occurred, therefore we cannot consider anymore all possible outcomes - our outcome space reduces to just B. In the actual reduced outcome space we must consider the event B to be sure thing (i.e. P*(B)=100%), so we have to scale down all theoretical probabilities by dividing by P(B). Therefore also probability of intersection A∩B is divided by P(B).

Two events A and B are independent (by definition) if:
It follows that if two events are independent, then probability of intersection A∩B is equal to product of the two probabilities:
Note that independence is symmetric:
If two events A and B are independent (and P(A),P(B)>0), they can still occur in one trial simultaneously. If it is the case that the two events cannot happen in one trial simultaneously, they are said to be mutually exclusive (disjoint), then:
The Bayes theorem puts into relation conditional probabilities P(A|B) and P(B|A):
Bayes theorem can be derived from the condition probabilities:
It is also useful to rewrite the theorem with the complement rule:


Typical example of Bayes theorem application is the cancer screening case. Event A is defined as cancer disease and event B is defined as positive result of the screening test. Suppose that prevalence of cancer in the population is P(A)=1%. Suppose that there is knowledge that for a person with cancer, the screening test is positive at P(B|A)=80% and for a healthy person it tends to be positive at P(B|Ac)=9.6%. The question is: what is the chance of having cancer if the test was positive?