Tuesday, October 31, 2017

Principal Component Analysis (of the Interest Rate Curve)

Principal Component Analysis (PCA) is method for reduction of dimensionality. It enables replacement of large number of correlated variables with smaller number of new variables – principal components, without losing of too much information.

This technique is often used to describe time series of an interest rate curve. Instead of using rates for all maturities of a curve, which are pretty much correlated, we replace them with few principal components. Typically 3 components are considered sufficient, as these should theoretically describe three main attributes of a curve – level, slope and curvature.

In the context of time series, PCA also requires stationarity of time series. This usually means, for the interest rate curves, that either differencing or calculation of log returns is prerequisite.

Having i=1,2,…n time series (for each maturity point e.g. ON,1W,2W,1M,…,30Y) each consisting of measures at historical times tj for j=1,2,…T, we calculate log return Xi(tj) for interest rate Si between time tj and tj-1 as:
The PCA method is applied to transform variables Xi into Yi. These new Yi are called principal components and have two advantages: (1) they are mutually uncorrelated and (2) some of them have such low variance that they can be “easily” omitted without losing too much information (thus reducing the dimensionality), because the principal components are estimated in a particular way, which maximizes the variance of - first the Y1, then Y2 etc...
X and Y expressed as vectors of random variables:
and together with coefficient matrix α:
the principal components are calculated as:
what translates for the k-principal component Yk (k=1,2,..n):
and where two particular conditions are required:
- for each k (k=1,2,..n):
- for each combination {i,k} where i<k (i=1,2,…n-1; k=2,3,….n) the principal components must be uncorrelated:
The calculation of the α matrix is based on the eigenvectors and eigenvalues of the X covariance matrix ∑. Each entry of the covariance matrix is defined as
The α matrix is easy to be computed as each row of the matrix α is in fact one eigenvector of the X covariance matrix . At the same time, variance of each principal component is given by the related eigenvalue λ. To reduce dimensionality, only the eigenvectors which respond to the largest eigenvalues are chosen.

Reverse calculation of original values from principal components is based on inverse matrix α-1.

For example, having the EURIBOR Curve for the first 9 months of year 2017:

library (data.table)
# Euribor January-September 2017


# first differencing
r <- diff(as.matrix(i))

# centering

# covariance matrix
x=(t(c) %*% c) / (dim(c)[1]-1)

# factor loadings from principal analysis
pc<- princomp(r)

# or alternative: factor loadings from eigenvectors

# compare results from the 2 methods:

# plot loadings (especially first 3)

# first three principal components (inverse differenced)
y=alpha %*% t(r)

Eigenvectors and Eigenvalues

For a matrix A, if there is vector x and value λ, such that it holds:
then x is called its eigenvector and λ is called its eigenvalue.

It is said that vector x is in the same direction as Ax (which is exceptional property). The parameter λ expresses whether and how much the vector shrinks or stretches.

It follows that multiplying the eigenvector by power of λp is like multiplying it p times by the matrix A:
There may exist multiple eigenvectors and eigenvalues. All vectors x can be found by solving equation:
This equation has a non-zero solution only if the determinant of the matrix is zero. In according, the parameters λ can be found by solving equation:

For example the two eigenvectors and eigenvalues for the 2x2 matrix:
Additional properties of nxn matrix A are the trace (sum of diagonal elements) which is sum of eigenvalues:
and the determinant which is product of eigenvalues:

Cornish-Fisher​ ​Expansion

It​ ​is​ ​​approximate​ ​method​ ​for​ ​deriving​ ​quantiles​ ​of​ ​any​ ​random​ ​variable​ ​distribution​ ​based​ ​on​ ​its cumulants.​ ​Cornish-Fisher​ ​Expansion​ ​can​ ​be​ ​used​ , for example, to​ ​calculate​ ​various​ ​Value​ ​At​ ​Risk​ ​(VaR)​ ​quantiles.

Cumulants​ ​κ​r​​ ​are​ ​an​ ​alternative​ ​expression​ ​of​ ​distribution​ ​moments​ ​µ​r.​ ​For​ ​a​ ​cumulant​ ​κ​r of​ ​an order​ ​​r ​​it​ ​holds​ ​for​ ​all​ ​real​ ​​t ​(where​ ​µ​r​​ ​denotes​ ​raw​ ​moment):
Not​ ​trivial​ ​to​ ​derive​ ​at​ ​all,​ ​but​ ​can​ ​be​ ​generally​ ​expressed​ ​by​ ​recursion:
This​ ​leads​ ​to​ ​expression​ ​from​ ​raw​ ​moments​ ​(first​ ​4​ ​cumulants​ ​showed):
Alternatively​ ​derived​ ​from​ ​central​ ​moments​ ​(for​ ​r>1):
The​ ​Cornish-Fischer​ ​expansion​ ​for​ ​approximate​ ​determination​ ​of​ ​quantile​ ​x​q​​ ​builds​ ​on​ ​the​ ​variable​ ​X mean​ ​µ,​ ​standard​ ​deviation​ ​σ​ ​and​ ​cumulants​ ​κ​r​,​ ​with​ ​the​ ​help​ ​of​ ​quantiles​ ​of​ ​standard​ ​normal distribution​ ​N(0,1):

Monday, October 30, 2017


(Remember definition of probability density function f(x):

Moments​ ​are​ ​measure​ ​of​ ​​probability​ ​density​ (in general - measure of “shape of a set of points”). The​ ​n-th​ ​moment​ ​µ​n​ ​ ​of​ ​a​ ​real-valued​ ​continuous​ ​function​ ​f(x)​ ​about​ ​value​ ​c​ of a probability density function f(x) is defined:
The value c is typically set as the mean of a distribution, then the moments are called “central” moments. If c=0, the moment is called a “raw” moment and is marked as µn.

The​ ​zeroth​ ​(raw)​ ​moment​ ​is​ ​equal​ ​to​ ​1​ (total area under the probability density function f(x))
The​ ​first​ ​(raw)​ ​moment​ ​is​ ​the​ ​mean
The​ ​second​ ​(central)​ ​moment​ ​is​ ​the​ ​variance
For the higher moments, standardized variants are typically shown (divided by σ^n ) and are marked as µ ̃.

The​ ​third​ ​(standardized​ ​central)​ ​moment​ ​is​ ​skewness
The​ ​fourth​ ​(standardized​ ​central)​ ​moment​ ​is​ ​kurtosis

Thursday, May 11, 2017

Conditional probability, Bayes Theorem (and how it’s derived)

Given two events A and B, with their respective probabilities P(A) and P(B), the conditional probability of A given B (the probability of event A with the knowledge that event B occurred) is defined as:

The intuition behind is: We already know that event B occurred, therefore we cannot consider anymore all possible outcomes - our outcome space reduces to just B. In the actual reduced outcome space we must consider the event B to be sure thing (i.e. P*(B)=100%), so we have to scale down all theoretical probabilities by dividing by P(B). Therefore also probability of intersection A∩B is divided by P(B).

Two events A and B are independent (by definition) if:
It follows that if two events are independent, then probability of intersection A∩B is equal to product of the two probabilities:
Note that independence is symmetric:
If two events A and B are independent (and P(A),P(B)>0), they can still occur in one trial simultaneously. If it is the case that the two events cannot happen in one trial simultaneously, they are said to be mutually exclusive (disjoint), then:
The Bayes theorem puts into relation conditional probabilities P(A|B) and P(B|A):
Bayes theorem can be derived from the condition probabilities:
It is also useful to rewrite the theorem with the complement rule:

Typical example of Bayes theorem application is the cancer screening case. Event A is defined as cancer disease and event B is defined as positive result of the screening test. Suppose that prevalence of cancer in the population is P(A)=1%. Suppose that there is knowledge that for a person with cancer, the screening test is positive at P(B|A)=80% and for a healthy person it tends to be positive at P(B|Ac)=9.6%. The question is: what is the chance of having cancer if the test was positive?

Tuesday, May 2, 2017

Large data in R (the ff package)

Since the R runtime environment is storing all data structures in RAM memory and it also performs all operations in there, it might consume whole available free memory and stop execution with a memory related error (e.g.: Error: cannot allocate vector of length…). This is sometimes the case when large data needs to be processed.
To see the current consumption of memory by R:
i=1:1e9 # try to create an one billion long integer vector (integer is stored in 4 bytes)
format(object.size(x=i),units="Mb") # (approximate) size of the object (in Mb)
memory.size(max=FALSE) # memory amount currently in use by R (Mb)
memory.size(max=TRUE) # maximum memory amount used during R session (Mb)
The ff package enables working with large data. In the core stands the ff class which represents large data (vector or array/matrix…). It is implemented in such way, that raw data are stored as a flat file on disk and the R object contains merely metadata (dimensions, virtual storage modes etc.) When operating on the ff class object, data are exchanged between disk and memory via “memory pages”, size of which depends on the setup (pagesize option).
To see and change the size of memory page:
getOption("ffpagesize") # see page size (in b)
options(ffpagesize=2^16) # set page size (in b)
The ff class object can be of one of standard atomic types (logical, integer, double, raw), but also some nonstandard types are supported via the virtual storage mode functionality (vmode).
To see currently supported virtual modes:
In addition, also following compound data types are supported:
  • factor
  • ordered
  • custom compounds
  • Date
  • POSIXct

To create a one billion long ff vector which contains sequence of numbers 1:1e9, one can try:

f <- ff(initdata=1:1e9,vmode="integer") # try creating ff vector

However the above statement execution will probably fail due to memory constraints, because the initdata sequence 1,2,...1e9 is standard R operation on integers and it may not fit into memory at once. One has to create an empty one billion long ff vector of integers first and then populate it in a batch approach. The size of the batch can be set either with parameter BATCHSIZE (maximum number of elements to be processed at once) or with parameter BATCHBYTES (maximum number of bytes to be processed at once):

f <- ff(length=1e9,vmode="integer",caching="mmeachflush") # define an empty one billion long ff vector of integers, the „each flush“ caching secures that memory pages are flushed at each step
ffvecapply(f[i1:i2]<-i1:i2, X=f, ,BATCHSIZE = 1e5,VERBOSE=T) # the i1 and i2 are standard variables supplied by the ffvecapply function and mark the start and end of the records processed in each batch
class(f) # "ff_vector" "ff"
class(f[1:10]) # but if using [] result is standard R object (i.e. vector of integers)

To define a ff matrix(array) and populate it by columns or by rows:

m <- ff(length=3e8,vmode="quad",caching="mmeachflush",levels=c("A","B","C")) # define a long ff vector of quads (up to 4 values) and define levels
class(m) # "ff_vector" "ff"
dim(m)<-c(1e8,3) # define matrix 1e8x3
class(m) # "ff_matrix" "ff_array" ff"
class(m[1:10,]) # but if using [] result is standard R object (i.e. matrix of factors)
ffrowapply(m[i1:i2,,bydim=c(2,1)]<-c("A","B","C"), X=m, ,BATCHSIZE = 3e5,VERBOSE=T) # the i1 and i2 are standard variables supplied by the ffrowapply function and mark the first and last row of the batch, use bydim to set row-based ordering

To see the attributes of a ff class object:

physical(m) # shows the physical attributes of the R object
virtual(m) # shows the virtual attributes of the R object

To coerce the ff object to a ram object (if enough memory):


And to coerce ram object to an ff object:


To iterate through the ff object elements to perform some operation, the range index approach can be exploited, as the chunk.bit() function returns range indexes for each chunk. Either the for loop can be used or lapply functionality as well:

chunk.bit(f,BATCHBYTES = 1024^2) # range indexes by 1MB chunks

for (i in chunk.bit(f,BATCHBYTES = 10*1024^2)) f[i] <- rnorm(n=sum(as.bit(i))) # generate random numbers, by chunks

l=lapply(chunk(f,BATCHBYTES = 1024^2),function(i) max(f[i])) # get maximum for each chunk
print(crbind(l)) # bind results of all chunks by rows together

It is possible (and even wished) that multiple ff objects use same flat file.
To find out which flat file is attached by which R objects:

b1 <- as.bit(sample(c(FALSE,TRUE), 1000, TRUE))
class(b1) # bit
f1 <- as.ff(b1)
class(f1) # "ff_vector" "ff"
b2 <- as.bit(f1)
f2 <- as.ff(b2)
filename(f2) # yet another object attached to the same flat file
f1[1]=T # set first object of first ff to TRUE
f2[1]=F # set first object of second ff to FALSE
f1[1]==f2[1] # since both use same file, the first object is equal (FALSE)

If there is need to physically copy whole ff object, use instead:

f1[1]==f2[1] # returns FALSE since they use different files

Similar to data.frame is the ffdf class, which is constructed as container of ff vectors:

m <- matrix(1:12, 3, 4, dimnames=list(c("r1","r2","r3"), c("m1","m2","m3","m4")))
v <- 1:3
ffm <- as.ff(m)
ffv <- as.ff(v)
ffd <- ffdf(ffm, v=ffv, row.names=row.names(ffm))

The storage of vectors can be controlled. Either split them into more or join them into fewer.

ffd <- ffdf(ffm, v=ffv, row.names=row.names(ffm), ff_join=list(newff=c(1,2)))
ffd <- ffdf(ffm, v=ffv, row.names=row.names(ffm), ff_split=1)

To sort the dataframe:

d2 <- ffdfsort(d) # sort directly

i <- ffdforder(d2) # sort through order index

To save ff objects:

ffsave(df, add=TRUE, file="d:/tmp/y", rootpath="d:/tmp")