Risk Lab ยท VaR backtest ยท CQF Module 2.3

๐ŸŒพWheatZW

Black-Sea geopolitical premium and the cleanest jump-diffusion case in commodities.

Last price

$647.00

2026-06-01

Daily change

-0.42%

Day range

644.00 โ€“ 661.50

session

Method

FHS

recommended

Asset thesis ยท why this asset is different

0.Jump-diffusion lives here

Wheat is the canonical example for jump-diffusion in commodity returns. Russia and Ukraine together export roughly 30% of global wheat, and headline risk produces overnight gaps no GARCH process can absorb. The 2022 Ukraine invasion is the textbook case โ€” a single weekend gap moved CBOT wheat ~40% before continuous trading resumed. FHS catches this from the realised distribution; everything else understates the geopolitical tail.

  • Ethiopia imports ~2 million tonnes annually; CBOT wheat is a direct inflation input for the East African basket.
  • The Black Sea Grain Initiative collapse (July 2023) generated a clean event-study case in the page's data.
  • Russian export tax thresholds and Australian harvest competition set the floor on Q3 prices.

Recommended VaR method

FHS ยท Geopolitical tail is fat and skewed; FHS lets the empirical 2022 and 2023 events stay in the historical sample where they belong.

Part I ยท the daily log-return series

1.From prices to returns

Everything that follows works on the daily log return rt=logโก(Pt/Ptโˆ’1)r_t = \log(P_t / P_{t-1}). Log returns are time-additive, well-behaved under aggregation, and the natural input for any vol / VaR model. The series below is exactly that, computed over 567 trading days from Supabase data.

r_t over time ยท green up, red down

7.0%-7.0%2024-04-172026-06-01

ฮผ (daily)

0.017%

ฯƒ (daily)

1.59%

ฯƒ (annualised)

30.5%

n

567

observations

The annualised vol used in the EWMA model is the current value from the filter, not a pooled sample estimate. Pooled ฯƒ across the entire history would mask the regime structure CQF Module 2.5 cares about.

Part II ยท stylized facts (CQF Module 2.4)

2.The four moments

Returns are described by their first four sample moments. The first two get most of the airtime; the third and fourth are what break Gaussian VaR.

ฮผ=1nโˆ‘trt,ฯƒ2=1nโˆ’1โˆ‘t(rtโˆ’ฮผ)2\mu = \frac{1}{n} \sum_t r_t, \quad \sigma^2 = \frac{1}{n-1} \sum_t (r_t - \mu)^2
ฮณ1=1nโˆ‘t(rtโˆ’ฮผฯƒ)3(skewness)\gamma_1 = \frac{1}{n}\sum_t \left(\frac{r_t - \mu}{\sigma}\right)^3 \quad (\text{skewness})
ฮณ2=1nโˆ‘t(rtโˆ’ฮผฯƒ)4โˆ’3(excessย kurtosis)\gamma_2 = \frac{1}{n}\sum_t \left(\frac{r_t - \mu}{\sigma}\right)^4 - 3 \quad (\text{excess kurtosis})

The Gaussian benchmark has ฮณ1=0\gamma_1 = 0 and ฮณ2=0\gamma_2 = 0. The empirical values on this asset, computed on the live series above, are ฮณ1=0.449\gamma_1 = 0.449 and ฮณ2=0.55\gamma_2 = 0.55. Both numbers feed directly into the Cornish-Fisher VaR adjustment in Part III.

3.The empirical density vs normal

Plotting the histogram of rtr_t against a normal fitted to the same (ฮผ,ฯƒ)(\mu, \sigma)makes the failure of the Gaussian assumption visible. Bars in the < ฮผ โˆ’ 2ฯƒ tail are coloured red; bars in the > ฮผ + 2ฯƒ tail are green. A normal would have a fixed 2.5% mass in each tail; if the empirical bars overshoot the curve in those regions, you have fat tails.

-4.5%6.4%โ€” normal( ฮผ, ฯƒ )

4.The Q-Q diagnostic

The Q-Q plot pairs the ii-th empirical order statistic with the corresponding quantile of N(ฮผ,ฯƒ2)\mathcal{N}(\mu, \sigma^2). Under normality every point sits on the 45-degree gold reference line. Departures at the lower-left say the left tail is heavier than normal (extra crash risk); departures at the upper-right say the right tail is heavier (extra rally risk). The two ends are where Gaussian VaR actually fails.

theoretical (normal) quantileobservedโ–ผ left tail (extra losses)โ–ฒ right tail (extra gains)

5.Autocorrelation: efficient-market & ARCH

Three autocorrelation panels tell three different stories. The ACF of rtr_ttests whether returns themselves are predictable โ€” efficient-market theory says they shouldn't be, and they typically aren't. The ACF of โˆฃrtโˆฃ|r_t| and rt2r_t^2 tests for volatility clustering: even though returns are unpredictable, their magnitudeis highly autocorrelated. That's the textbook motivation for any ARCH-family model.

ฯk=โˆ‘t=k+1n(xtโˆ’xห‰)(xtโˆ’kโˆ’xห‰)โˆ‘t=1n(xtโˆ’xห‰)2,95%ย band:ย โˆฃฯkโˆฃ<1.96n.\rho_k = \frac{\sum_{t=k+1}^{n}(x_t - \bar x)(x_{t-k} - \bar x)}{\sum_{t=1}^{n}(x_t - \bar x)^2}, \quad \text{95\% band: } |\rho_k| < \frac{1.96}{\sqrt n}.

ACF of r_t (efficient-market check)

0.20-0.20lag kยฑ1.96 / โˆšN significance bandshould be inside the gold band

ACF of |r_t| (volatility clustering)

0.21-0.20lag kยฑ1.96 / โˆšN significance bandbars outside the band = persistent vol

ACF of r_tยฒ (ARCH effects)

0.22-0.20lag kยฑ1.96 / โˆšN significance bandconfirms a non-constant variance

6.Stationarity in the second moment

The rolling 60-day annualised mean and standard deviation make the regime structure visible. A stationary series has both lines hovering around their long-run averages; a non-stationary one has the volatility wandering across regimes. For commodities, the latter is the rule.

โ€” rolling mean (annualised)โ€” rolling ฯƒ (annualised)observation idx120%-130%

7.Findings

Six tests, each with its own per-asset value and verdict. The point isn't to fail facts โ€” most pass on most assets โ€” it's to document that we checked.

Fat tails (excess kurtosis > 0)

0.55

Mildly leptokurtic.

Gain/loss asymmetry (skewness โ‰  0)

0.449

Right-skewed: tail upside dominates.

Volatility clustering ( |r| autocorr )

0.049

Mild persistence.

Squared returns autocorr

0.063

Weak ARCH evidence.

Weak return autocorrelation (efficient market)

0.077

Some short-term predictability โ€” uncommon in liquid markets.

Returns stationary (ADF t-stat < โˆ’2.89)

-22.08

Returns are stationary at 5%. Levels would not be.

Part III ยท VaR method comparison

8.Gaussian (the baseline)

The textbook VaR at level ฮฑ\alpha under a Gaussian return assumption is VaRฮฑ=โˆ’(ฮผ+zฮฑฯƒ)\mathrm{VaR}_\alpha = -(\mu + z_\alpha \sigma) where zฮฑ=ฮฆโˆ’1(1โˆ’ฮฑ)z_\alpha = \Phi^{-1}(1-\alpha). Quick, closed-form, and wrong in either tail of any real-world return series.

VaRฮฑG=โˆ’(ฮผ+zฮฑโ€‰ฯƒ)\mathrm{VaR}_\alpha^{\text{G}} = -\left(\mu + z_\alpha\,\sigma\right)

9.Cornish-Fisher (Edgeworth correction)

Cornish-Fisher replaces the Gaussian quantile zฮฑz_\alpha with a polynomial adjustment that absorbs skewness and excess kurtosis.

zฮฑCF=zฮฑ+(zฮฑ2โˆ’1)6ฮณ1+(zฮฑ3โˆ’3zฮฑ)24ฮณ2โˆ’(2zฮฑ3โˆ’5zฮฑ)36ฮณ12z_\alpha^{CF} = z_\alpha + \tfrac{(z_\alpha^2 - 1)}{6}\gamma_1 + \tfrac{(z_\alpha^3 - 3 z_\alpha)}{24}\gamma_2 - \tfrac{(2 z_\alpha^3 - 5 z_\alpha)}{36}\gamma_1^2

For a left-skewed, heavy-tailed return series the cubic in zฮฑz_\alpha pushes the quantile further into the tail than Gaussian, so VaRฮฑCF>VaRฮฑG\mathrm{VaR}_\alpha^{CF} > \mathrm{VaR}_\alpha^{\text{G}}. On this asset the empirical ฮณ1=0.449\gamma_1 = 0.449, ฮณ2=0.55\gamma_2 = 0.55 drive that correction directly.

10.Filtered Historical Simulation

FHS sits between parametric and non-parametric. The recipe:

  1. Estimate a volatility model โ€” here EWMA with ฮป from MLE.
  2. Compute standardised returns r~t=rt/ฯƒ^t\tilde r_t = r_t / \hat\sigma_t.
  3. Read the empirical (1โˆ’ฮฑ)(1-\alpha) quantile q1โˆ’ฮฑq_{1-\alpha} from the standardised distribution.
  4. Scale back: VaRฮฑFHS=โˆ’q1โˆ’ฮฑโ‹…ฯƒ^T\mathrm{VaR}_\alpha^{\text{FHS}} = -q_{1-\alpha} \cdot \hat\sigma_T.

The non-parametric quantile lets the actual tail shape speak; the EWMA scale lets the vol model react to recent days. Glasserman calls this โ€œthe right compromise between Gauss and history.โ€

11.All three, on the same histogram

The cleanest way to see how the three methods disagree is to draw all three thresholds on the same returns distribution and slide the confidence level around. At low ฮฑ\alpha the three almost coincide. As ฮฑ\alpha approaches 99% the gap widens โ€” and whichever method is most conservative on this asset is the one that was best aware of the tail.

ฮฑ = 95% ยท z_ฮฑ = -1.645

50%99.5%
G 2.6%CF 2.4%FHS 2.4%-4.5%6.4%โ€” normal( ฮผ, ฯƒ )

Each dashed line is a VaR threshold at the chosen ฮฑ. Drag the slider toward 99% and watch how the three methods diverge โ€” that gap is the skew-and-kurtosis effect.

12.Numerical comparison

MethodVaR 95%CVaR 95%VaR 99%CVaR 99%
Gaussian2.60%3.27%3.69%4.23%
Cornish-Fisher2.38%3.74%3.25%5.40%
FHSrecommended2.94%3.47%3.84%4.51%

Part IV ยท VaR backtest (Kupiec + Christoffersen)

13.Why backtest a VaR at all

Reporting VaR0.95\mathrm{VaR}_{0.95} without checking how often the realised loss exceeded it is the cardinal sin of risk reporting. A correct VaR model breaches at exactly the unconditional rate p=1โˆ’ฮฑp = 1 - \alpha, and the breaches should be independent in time. Kupiec (1995) and Christoffersen (1998) gave us likelihood-ratio tests for both properties.

14.Kupiec POF (unconditional coverage)

With xx breaches in nn days, the empirical breach rate is p^=x/n\hat p = x/n. The LR statistic is ฯ‡2(1)\chi^2(1) under the null that p^=p\hat p = p.

LRuc=โˆ’2logโก[(1โˆ’p)nโˆ’xโ€‰px(1โˆ’p^)nโˆ’xโ€‰p^x]โˆผฯ‡2(1)\mathrm{LR}_{\text{uc}} = -2\log\left[\frac{(1-p)^{n-x}\,p^x}{(1-\hat p)^{n-x}\,\hat p^x}\right] \sim \chi^2(1)

High p-value means we cannot reject correct coverage; low p-value means the breach rate differs from the target in a way that can't be explained by sampling noise.

15.Christoffersen (independence)

Even when the total breach count is right, breaches that cluster signal a vol model too slow to react. Christoffersen builds a 2ร—2 transition table over the breach indicator and tests whether P(breachtโˆฃbreachtโˆ’1)=P(breachtโˆฃnoย breachtโˆ’1)P(\text{breach}_t \mid \text{breach}_{t-1}) = P(\text{breach}_t \mid \text{no breach}_{t-1}).

LRind=โˆ’2logโกL(ฯ€)L(ฯ€01,ฯ€11)โˆผฯ‡2(1)\mathrm{LR}_{\text{ind}} = -2\log\frac{L(\pi)}{L(\pi_{01}, \pi_{11})} \sim \chi^2(1)

with ฯ€ij\pi_{ij} the empirical transition rates from state ii (no breach / breach) to state jj. The combined conditional-coverage test adds the two LRs and tests against ฯ‡2(2)\chi^2(2) โ€” both at once.

16.Results on this asset

Run on a rolling 60-day window with the recommended VaR method:

95% confidence

n = 507

Breaches

22

expected 25.4

Breach rate

4.34%

target 5%

  • Kupiec POF

    unconditional coverage

    LR = 0.49p = 0.485
  • Christoffersen

    breach independence

    LR = 0.00p = 0.963
  • Combined CC

    conditional coverage

    LR = 0.49p = 0.783

99% confidence

n = 507

Breaches

6

expected 5.1

Breach rate

1.18%

target 1%

  • Kupiec POF

    unconditional coverage

    LR = 0.16p = 0.687
  • Christoffersen

    breach independence

    LR = 0.14p = 0.704
  • Combined CC

    conditional coverage

    LR = 0.31p = 0.858

17.Breach timeline

Each blue tick is a day with no breach; gold dots trace the negative of the VaR threshold; red ticks are breaches. Clustered red ticks are what Christoffersen catches.

loss โ†’2026-06-012024-07-12

18.Year-by-year breach distribution

Counting breaches per calendar year against the expected rate makes regime episodes obvious. Bars in green sit at or below the gold expected-rate tick; bars in red overshoot.

2462510266โ€” expected at ฮฑ = 95%โ–ฎ at or below expectedโ–ฎ exceeds expected

Data: Supabase + GitHub Actions ETL ยท last update 2026-06-01