Let X₁, X₂, …,X_n be iid. variables (e.g. account balances, market returns, cash-flows, profit/loss results…), with distribution function

If we expect that extreme values of X might occur occasionally (e.g. bank-run withdrawals, extraordinary losses, asset values slump…), we can approximate (and extrapolate) the tail of such distribution with Generalized Pareto distribution.

Tail of a distribution is to be determined by a specific threshold µ, hence we are looking for “peaks-over-threshold” of µ. The choice of µ is arbitrary, however critical, as on side we shall fit the Generalized Pareto distribution on the (heavy) tail only, on other side sufficient number of observations is necessary to estimate its parameters.

In general, the distribution of exceedances y=(X-µ) is given:

for each y>0 by the function

or expressed alternatively as complementary cumulative distribution

otherwise (y≤0)

In case of all X>µ we denote

And we can approximate Y by Generalized Pareto distribution with parameter ξ (shape) and β (scale)

where y≥0 if ξ≥0 and 0≤y≤-β/ξ if ξ<0.

Under certain conditions (if F is in the domain of attraction of the Fréchet, Gumbel or Weibull distributions) and for µ large enough (close to some maximal observation x_max) we can use the Generalized Pareto Distribution to model the exceedance distribution

which holds for complementary distribution functions too

which leads to expression through multiplication

Any quantile x can then be found with quantile function Q(p) at probability p:

With estimated parameters of Generalized Pareto Distribution β and ξ and estimated ζ probability of occurrence over threshold µ we estimate the distribution

Example in R

library(POT)

par(mfrow=c(3,3))

# generate normal distribution with some extremes

n=1e3

ext=runif(20,10,50)

x=c(rnorm(n = n-length(c),mean = 10,sd = 5),ext)

hist(x,n=100)

# find threshold

mrlplot(data=x, nt=20)

tcplot(data=x,nt=20)

# fit gpd on right tail

loc=20

mle=fitgpd(data=x, threshold=loc, est="mle")

print(mle$param)

plot(mle,which = c(1,3)) # show p-p plot and histogram

# generate random from this distribution

hist(rgpd(n=1e3,loc=loc,scale=mle$param[1],shape=mle$param[2]),n=100)

# calculate gdp 99% quantile and compare with empirical quantile

qgpd(p=0.999, loc=loc,scale=mle$param[1],shape=mle$param[2],lambda=mean(x<=loc))

quantile(x,0.999)

Applied statistics and risk management

Friday, December 20, 2019

The Generalized Pareto Distribution / Peaks-over-threshold method