Let X1,
X2, …,Xn be iid. variables (e.g. account balances, market returns, cash-flows,
profit/loss results…), with distribution function
If we expect that
extreme values of X might occur occasionally (e.g. bank-run
withdrawals, extraordinary losses, asset values slump…), we can approximate (and extrapolate) the
tail of such distribution with Generalized Pareto distribution.
Tail of a distribution
is to be determined by a specific threshold µ, hence we are looking for
“peaks-over-threshold” of µ. The choice of µ is arbitrary, however critical, as on
side we shall fit the Generalized Pareto distribution on the (heavy) tail only,
on other side sufficient number of observations is necessary to estimate its
parameters.
In general, the distribution
of exceedances y=(X-µ) is given:
- for each y>0 by the function
or expressed alternatively as complementary cumulative
distribution
In case of all X>µ
we denote
And we can approximate
Y by Generalized Pareto distribution with parameter ξ
(shape) and β (scale)
where y≥0
if ξ≥0 and 0≤y≤-β/ξ if ξ<0.
Under
certain conditions (if F is in the domain of attraction of the Fréchet, Gumbel or Weibull distributions) and for µ large enough (close to some
maximal observation xmax) we can use the Generalized Pareto
Distribution to model the exceedance distribution
which holds for
complementary distribution functions too
which leads to
expression through multiplication
Any quantile x can then be found with quantile function Q(p) at
probability p:
With estimated
parameters of Generalized Pareto Distribution β and ξ and estimated ζ probability of
occurrence over threshold µ
we estimate the distribution
Example in R
library(POT)
par(mfrow=c(3,3))
# generate normal
distribution with some extremes
n=1e3
ext=runif(20,10,50)
x=c(rnorm(n = n-length(c),mean
= 10,sd = 5),ext)
hist(x,n=100)
# find threshold
mrlplot(data=x, nt=20)
tcplot(data=x,nt=20)
# fit gpd on right tail
loc=20
mle=fitgpd(data=x,
threshold=loc, est="mle")
print(mle$param)
plot(mle,which = c(1,3)) #
show p-p plot and histogram
# generate random from this
distribution
hist(rgpd(n=1e3,loc=loc,scale=mle$param[1],shape=mle$param[2]),n=100)
# calculate gdp 99% quantile
and compare with empirical quantile
qgpd(p=0.999,
loc=loc,scale=mle$param[1],shape=mle$param[2],lambda=mean(x<=loc))
quantile(x,0.999)