Tuesday, October 31, 2017

Cornish-Fisher​ ​Expansion

It​ ​is​ ​​approximate​ ​method​ ​for​ ​deriving​ ​quantiles​ ​of​ ​any​ ​random​ ​variable​ ​distribution​ ​based​ ​on​ ​its cumulants.​ ​Cornish-Fisher​ ​Expansion​ ​can​ ​be​ ​used​ , for example, to​ ​calculate​ ​various​ ​Value​ ​At​ ​Risk​ ​(VaR)​ ​quantiles.

Cumulants​ ​κ​r​​ ​are​ ​an​ ​alternative​ ​expression​ ​of​ ​distribution​ ​moments​ ​µ​r.​ ​For​ ​a​ ​cumulant​ ​κ​r of​ ​an order​ ​​r ​​it​ ​holds​ ​for​ ​all​ ​real​ ​​t ​(where​ ​µ​r​​ ​denotes​ ​raw​ ​moment):
Not​ ​trivial​ ​to​ ​derive​ ​at​ ​all,​ ​but​ ​can​ ​be​ ​generally​ ​expressed​ ​by​ ​recursion:
This​ ​leads​ ​to​ ​expression​ ​from​ ​raw​ ​moments​ ​(first​ ​4​ ​cumulants​ ​showed):
Alternatively​ ​derived​ ​from​ ​central​ ​moments​ ​(for​ ​r>1):
The​ ​Cornish-Fischer​ ​expansion​ ​for​ ​approximate​ ​determination​ ​of​ ​quantile​ ​x​q​​ ​builds​ ​on​ ​the​ ​variable​ ​X mean​ ​µ,​ ​standard​ ​deviation​ ​σ​ ​and​ ​cumulants​ ​κ​r​,​ ​with​ ​the​ ​help​ ​of​ ​quantiles​ ​of​ ​standard​ ​normal distribution​ ​N(0,1):

Monday, October 30, 2017

Moments

(Remember definition of probability density function f(x):
)

Moments​ ​are​ ​measure​ ​of​ ​​probability​ ​density​ (in general - measure of “shape of a set of points”). The​ ​n-th​ ​moment​ ​µ​n​ ​ ​of​ ​a​ ​real-valued​ ​continuous​ ​function​ ​f(x)​ ​about​ ​value​ ​c​ of a probability density function f(x) is defined:
The value c is typically set as the mean of a distribution, then the moments are called “central” moments. If c=0, the moment is called a “raw” moment and is marked as µn.

The​ ​zeroth​ ​(raw)​ ​moment​ ​is​ ​equal​ ​to​ ​1​ (total area under the probability density function f(x))
The​ ​first​ ​(raw)​ ​moment​ ​is​ ​the​ ​mean
The​ ​second​ ​(central)​ ​moment​ ​is​ ​the​ ​variance
For the higher moments, standardized variants are typically shown (divided by σ^n ) and are marked as µ ̃.

The​ ​third​ ​(standardized​ ​central)​ ​moment​ ​is​ ​skewness
The​ ​fourth​ ​(standardized​ ​central)​ ​moment​ ​is​ ​kurtosis

Thursday, May 11, 2017

Conditional probability, Bayes Theorem (and how it’s derived)

Given two events A and B, with their respective probabilities P(A) and P(B), the conditional probability of A given B (the probability of event A with the knowledge that event B occurred) is defined as:

The intuition behind is: We already know that event B occurred, therefore we cannot consider anymore all possible outcomes - our outcome space reduces to just B. In the actual reduced outcome space we must consider the event B to be sure thing (i.e. P*(B)=100%), so we have to scale down all theoretical probabilities by dividing by P(B). Therefore also probability of intersection A∩B is divided by P(B).

Two events A and B are independent (by definition) if:
It follows that if two events are independent, then probability of intersection A∩B is equal to product of the two probabilities:
Note that independence is symmetric:
If two events A and B are independent (and P(A),P(B)>0), they can still occur in one trial simultaneously. If it is the case that the two events cannot happen in one trial simultaneously, they are said to be mutually exclusive (disjoint), then:
The Bayes theorem puts into relation conditional probabilities P(A|B) and P(B|A):
Bayes theorem can be derived from the condition probabilities:
It is also useful to rewrite the theorem with the complement rule:


Typical example of Bayes theorem application is the cancer screening case. Event A is defined as cancer disease and event B is defined as positive result of the screening test. Suppose that prevalence of cancer in the population is P(A)=1%. Suppose that there is knowledge that for a person with cancer, the screening test is positive at P(B|A)=80% and for a healthy person it tends to be positive at P(B|Ac)=9.6%. The question is: what is the chance of having cancer if the test was positive?

Tuesday, May 2, 2017

Large data in R (the ff package)

Since the R runtime environment is storing all data structures in RAM memory and it also performs all operations in there, it might consume whole available free memory and stop execution with a memory related error (e.g.: Error: cannot allocate vector of length…). This is sometimes the case when large data needs to be processed.
 
To see the current consumption of memory by R:
 
i=1:1e9 # try to create an one billion long integer vector (integer is stored in 4 bytes)
format(object.size(x=i),units="Mb") # (approximate) size of the object (in Mb)
memory.size(max=FALSE) # memory amount currently in use by R (Mb)
memory.size(max=TRUE) # maximum memory amount used during R session (Mb)
 
The ff package enables working with large data. In the core stands the ff class which represents large data (vector or array/matrix…). It is implemented in such way, that raw data are stored as a flat file on disk and the R object contains merely metadata (dimensions, virtual storage modes etc.) When operating on the ff class object, data are exchanged between disk and memory via “memory pages”, size of which depends on the setup (pagesize option).
 
To see and change the size of memory page:
 
library("ff")
getOption("ffpagesize") # see page size (in b)
options(ffpagesize=2^16) # set page size (in b)
 
The ff class object can be of one of standard atomic types (logical, integer, double, raw), but also some nonstandard types are supported via the virtual storage mode functionality (vmode).
 
To see currently supported virtual modes:
 
data.frame(MODE=names(.vmode),IMPLEMENTED=.vimplemented[names(.vmode)],STORAGE=.rammode[names(.vmode)],MIN=.vmin[names(.vmode)],MAX=.vmax[names(.vmode)],NAs=ifelse(is.na(.vNA[names(.vmode)]),"Yes","No"),RAMbytes=.rambytes[names(.vmode)],FFbytes=.ffbytes[names(.vmode)])
 
In addition, also following compound data types are supported:
  • factor
  • ordered
  • custom compounds
  • Date
  • POSIXct


To create a one billion long ff vector which contains sequence of numbers 1:1e9, one can try:


f <- ff(initdata=1:1e9,vmode="integer") # try creating ff vector


However the above statement execution will probably fail due to memory constraints, because the initdata sequence 1,2,...1e9 is standard R operation on integers and it may not fit into memory at once. One has to create an empty one billion long ff vector of integers first and then populate it in a batch approach. The size of the batch can be set either with parameter BATCHSIZE (maximum number of elements to be processed at once) or with parameter BATCHBYTES (maximum number of bytes to be processed at once):


f <- ff(length=1e9,vmode="integer",caching="mmeachflush") # define an empty one billion long ff vector of integers, the „each flush“ caching secures that memory pages are flushed at each step
ffvecapply(f[i1:i2]<-i1:i2, X=f, ,BATCHSIZE = 1e5,VERBOSE=T) # the i1 and i2 are standard variables supplied by the ffvecapply function and mark the start and end of the records processed in each batch
class(f) # "ff_vector" "ff"
class(f[1:10]) # but if using [] result is standard R object (i.e. vector of integers)


To define a ff matrix(array) and populate it by columns or by rows:


m <- ff(length=3e8,vmode="quad",caching="mmeachflush",levels=c("A","B","C")) # define a long ff vector of quads (up to 4 values) and define levels
class(m) # "ff_vector" "ff"
dim(m)<-c(1e8,3) # define matrix 1e8x3
class(m) # "ff_matrix" "ff_array" ff"
class(m[1:10,]) # but if using [] result is standard R object (i.e. matrix of factors)
ffrowapply(m[i1:i2,,bydim=c(2,1)]<-c("A","B","C"), X=m, ,BATCHSIZE = 3e5,VERBOSE=T) # the i1 and i2 are standard variables supplied by the ffrowapply function and mark the first and last row of the batch, use bydim to set row-based ordering


To see the attributes of a ff class object:


physical(m) # shows the physical attributes of the R object
virtual(m) # shows the virtual attributes of the R object


To coerce the ff object to a ram object (if enough memory):


ramclass(m)
as.ram(m)


And to coerce ram object to an ff object:


as.ff(i)


To iterate through the ff object elements to perform some operation, the range index approach can be exploited, as the chunk.bit() function returns range indexes for each chunk. Either the for loop can be used or lapply functionality as well:


chunk.bit(f,BATCHBYTES = 1024^2) # range indexes by 1MB chunks


for (i in chunk.bit(f,BATCHBYTES = 10*1024^2)) f[i] <- rnorm(n=sum(as.bit(i))) # generate random numbers, by chunks


l=lapply(chunk(f,BATCHBYTES = 1024^2),function(i) max(f[i])) # get maximum for each chunk
print(crbind(l)) # bind results of all chunks by rows together


It is possible (and even wished) that multiple ff objects use same flat file.
To find out which flat file is attached by which R objects:


b1 <- as.bit(sample(c(FALSE,TRUE), 1000, TRUE))
class(b1) # bit
f1 <- as.ff(b1)
class(f1) # "ff_vector" "ff"
b2 <- as.bit(f1)
f2 <- as.ff(b2)
filename(f1)
filename(f2) # yet another object attached to the same flat file
f1[1]=T # set first object of first ff to TRUE
f2[1]=F # set first object of second ff to FALSE
f1[1]==f2[1] # since both use same file, the first object is equal (FALSE)


If there is need to physically copy whole ff object, use instead:


f2=clone(f1)
f1[1]=T
f2[1]=F
f1[1]==f2[1] # returns FALSE since they use different files


Similar to data.frame is the ffdf class, which is constructed as container of ff vectors:


m <- matrix(1:12, 3, 4, dimnames=list(c("r1","r2","r3"), c("m1","m2","m3","m4")))
v <- 1:3
ffm <- as.ff(m)
ffv <- as.ff(v)
ffd <- ffdf(ffm, v=ffv, row.names=row.names(ffm))
filename(ffm)
filename(ffv)
filename(ffd)
ffd
physical(ffd)


The storage of vectors can be controlled. Either split them into more or join them into fewer.


ffd <- ffdf(ffm, v=ffv, row.names=row.names(ffm), ff_join=list(newff=c(1,2)))
ffd <- ffdf(ffm, v=ffv, row.names=row.names(ffm), ff_split=1)






To sort the dataframe:


d2 <- ffdfsort(d) # sort directly


i <- ffdforder(d2) # sort through order index
d[i,]






To save ff objects:


ffsave(df, add=TRUE, file="d:/tmp/y", rootpath="d:/tmp")

Wednesday, April 12, 2017

The Johansen cointegration approach

The Johansen’s solution to cointegration is based on vector autoregression (VAR) and allows simultaneous analysis of n time series. With (n x 1) vector Xt the VAR model is defined (for simplicity neither intercept vector c, nor deterministic regressors CDt are considered):
This can be rewritten to a multivariate generalization of the Augmented Dickey-Fuller test:
This is the "transitory" form (because it includes Xt-1, as opposed to "long-run" form which includes Xt-p) therefore it holds that:
The rank r of the (n x n) matrix Π determines number of cointegration vectors (0≤r<n). If r=0 then there is no cointegration.  Matrix Π (impact matrix) can be rewritten to product of (n x r) matrix of adjustment parameters α and (r x n) matrix of cointegrating vectors β’.
For example, having two time series the VAR(1) and VECM(1) model become:
To find out r (number of cointegrating vectors), we can perform the “trace” test to test for:
H0: There are r (linearly-independent) cointegrating vectors (and n-r common stochastic trends)
H1: There are more than r cointegrating vectors
We usually start with r=0, perform the test and in case of rejection we increase r by 1 and test again, until the hypothesis is not rejected.


An example in R code (taken from Bernhard Pfaff: Tutorial - Analysis of Integrated and Cointegrated Time Series) :


set.seed(12345)
e1 <- rnorm(250, 0, 0.5)
e2 <- rnorm(250, 0, 0.5)
e3 <- rnorm(250, 0, 0.5)
u1.ar1 <- arima.sim(model = list(ar=0.75), innov = e1, n = 250)
u2.ar1 <- arima.sim(model = list(ar=0.3), innov = e2, n = 250)
y3 <- cumsum(e3)
y1 <- 0.8 * y3 + u1.ar1
y2 <- -0.3 * y3 + u2.ar1
ymax <- max(c(y1, y2, y3))
ymin <- min(c(y1, y2, y3))
plot(y1, ylab = "", xlab = "", ylim = c(ymin, ymax))
lines(y2, col = "red")
lines(y3, col = "blue")
y.mat <- data.frame(y1, y2, y3)
vecm1 <- ca.jo(y.mat, type = "eigen", spec = "transitory")
vecm2 <- ca.jo(y.mat, type = "trace", spec = "transitory")
vecm.r2 <- cajorls(vecm1, r = 2)
vecm.level <- vec2var(vecm1, r = 2)
vecm.pred <- predict(vecm.level,n.ahead = 10)
fanchart(vecm.pred)
vecm.irf <- irf(vecm.level, impulse = 'y3',response = 'y1', boot = FALSE)
vecm.fevd <- fevd(vecm.level)
vecm.norm <- normality.test(vecm.level)
vecm.arch <- arch.test(vecm.level)
vecm.serial <- serial.test(vecm.level)

The Engle-Granger cointegration approach

The two steps of Engle-Granger’s approach to cointegration, in case of two time series yt, xt, are:

1st step:
Test for integration order of both variables (as it is condition for cointegration that all variables are integrated of same order I(d)). This is achieved  (for instance) via the Augmented Dickey-Fuller test (any of the three variants can be used, according to judgment). If both variables are integrated of same order, least square regression cointegrating model can be estimated:
The residuals t are then tested for stationarity (Augmented Dickey-Fuller test without drift and without trend,  however since residuals are themselves estimates, the MacKinnon critical values have to be used). If time series of residuals is found stationary, the two time series are said to be cointegrated .


2nd step:
Estimate error correction model with residuals t (from step 1). As the estimate of β̂ is superconsistent, the residuals can be considered to be “known” and then ordinary least square applies:


An example in R code (taken from Bernhard Pfaff: Tutorial - Analysis of Integrated and Cointegrated Time Series) :




set.seed(12345)
e1 <- rnorm(250, mean = 0, sd = 0.5)
e2 <- rnorm(250, mean = 0, sd = 0.5)
u.ar3 <- arima.sim(model = list(ar = c(0.6, -0.2, 0.1)), n = 250, innov = e1)
y2 <- cumsum(e2)
y1 <- u.ar3 + 0.5*y2
ymax <- max(c(y1, y2))
ymin <- min(c(y1, y2))
layout(matrix(1:2, nrow = 2, ncol = 1))
plot(y1, xlab = "", ylab = "", ylim =c(ymin, ymax), main ="Cointegrated System")
lines(y2, col = "green")
plot(u.ar3, ylab = "", xlab = "", main = "Cointegrating Residuals")
abline(h = 0, col = "red")
lr <- lm(y1 ~ y2)
ect <- resid(lr)[1:249]
dy1 <- diff(y1)
dy2 <- diff(y2)
ecmdat <- cbind(dy1, dy2, ect)
ecm <- dynlm(dy1 ~ L(ect, 1) + L(dy1, 1)+ L(dy2, 1) , data = ecmdat)

Cointegration and error correction model

If two (or more) time series are integrated of same order (e.g. of order one I(1)), but their linear combination is integrated of lesser order (e.g. integrated of order zero I(0)), then these series are cointegrated.


In the vector notation: if yt is (n x 1) vector of time series (y1,t, y2,t, ... yn,t)' all integrated of order I(d) and β is (n x 1) cointegrating vector 1, β2, ... βn)' , then if there exists linear combination zt integrated of order I(d-b), the vector of time series is cointegrated I(d,b).


In fact, there can be more than one cointegrating vectors, then we are dealing with (n x r) matrix β. There can be 0<r<n cointegrating vectors. It follows that there are n-r common stochastic trends.



One advantage of cointegration is that the problem of “spurious regression” can be avoided in this case, as the least square regression gives unbiased and superconsistent estimates.


Another advantage is that it is a long-term model (e.g. captures long-term relationship between levels of two time series y and x) in contrary to a model fitted on differenced data which would be a short-term model (e.g. captures effect of change in x on change of y). Combination of both approaches (short- and long-term models) is known as error correction model (ECM), which is self-regulating model that allows for short-term autoregressive behavior but after some time always aligns time series into equilibrium given by cointegrating vectors.


To test for cointegration and to create an error correction model, several approaches are possible, most common are the Engle-Granger's two step estimation and Johansen‘s vector autoregressive (VAR) technique. While the Engle-Granger's approach is easier to perform, it allows only one cointegrating vector at most (one has to decide which of the series is dependent and which are idependent).