Friday, December 20, 2019

The Generalized Pareto Distribution / Peaks-over-threshold method

Let X1, X2, …,Xn be iid. variables (e.g. account balances, market returns, cash-flows, profit/loss results…), with distribution function

If we expect that extreme values of X might occur occasionally (e.g. bank-run withdrawals, extraordinary losses, asset values slump…), we can approximate (and extrapolate) the tail of such distribution with Generalized Pareto distribution.

Tail of a distribution is to be determined by a specific threshold µ, hence we are looking for “peaks-over-threshold” of µ. The choice of µ is arbitrary, however critical, as on side we shall fit the Generalized Pareto distribution on the (heavy) tail only, on other side sufficient number of observations is necessary to estimate its parameters.

In general, the distribution of exceedances y=(X-µ) is given:


  • for each y>0 by the function
or expressed alternatively as complementary cumulative distribution
  • otherwise (y≤0)

In case of all X>µ we denote

And we can approximate Y by Generalized Pareto distribution with parameter ξ (shape) and β (scale)

where y≥0 if ξ≥0 and 0≤y≤-β/ξ if ξ<0.


Under certain conditions (if F is in the domain of attraction of the Fréchet, Gumbel or Weibull distributions) and for µ large enough (close to some maximal observation xmax) we can use the Generalized Pareto Distribution to model the exceedance distribution
which holds for complementary distribution functions too
which leads to expression through multiplication
Any quantile x can then be found with quantile function Q(p) at probability p:

With estimated parameters of Generalized Pareto Distribution β and ξ and estimated ζ probability of occurrence over threshold µ we estimate the distribution
Example in R

library(POT)
par(mfrow=c(3,3))

# generate normal distribution with some extremes
n=1e3
ext=runif(20,10,50)
x=c(rnorm(n = n-length(c),mean = 10,sd = 5),ext)
hist(x,n=100)

# find threshold
mrlplot(data=x, nt=20)
tcplot(data=x,nt=20)

# fit gpd on right tail
loc=20
mle=fitgpd(data=x, threshold=loc, est="mle")
print(mle$param)
plot(mle,which = c(1,3)) # show p-p plot and histogram

# generate random from this distribution
hist(rgpd(n=1e3,loc=loc,scale=mle$param[1],shape=mle$param[2]),n=100)

# calculate gdp 99% quantile and compare with empirical quantile
qgpd(p=0.999, loc=loc,scale=mle$param[1],shape=mle$param[2],lambda=mean(x<=loc))
quantile(x,0.999)