ORATS SPY Options Chain EDA¶

This notebook aims to provide additional context on the quality checks implemented in the library, beyond what is covered in the docstrings and code structure.

We load the qc_summary.json generated after running the QC pipeline and complement it with exploratory analysis (EDA) to provide a clearer, visual understanding of the results and their practical implications.

The notebook is structured as follows:

  1. Read SPY Options data
  2. Load QC summary artifact
  3. GLOBAL vs ROI Interpretation Policy
  4. Basic Checks
  5. Days-to-expiry check
  6. Quote sanity checks
  7. Volume & Open Interest Checks
  8. Spot price sanity checks
  9. Dividend Yield checks
  10. Risk-free Rate checks
  11. Implied Volatility Quality Checks
  12. Greeks Sanity Checks
  13. Put-call parity diagnostics
  14. Price-bounds diagnostics (calls and puts)
  15. Monotonicity diagnostics
  • 15.1. Vertical spread arbitrage (strike monotonicity)
  • 15.2. Calendar arbitrage (maturity monotonicity)
  1. Conclusion
In [10]:
%load_ext autoreload
%autoreload 2

import json
from datetime import date

import pandas as pd
import polars as pl
import yfinance as yf

from volatility_trading.config.paths import PROC_ORATS_OPTIONS_CHAIN
from volatility_trading.datasets import (
    options_chain_wide_to_long,
    read_daily_features,
    scan_options_chain,
)

try:
    from notebooks.qc_eda.helpers import QCSummaryHelper
    from notebooks.qc_eda.plotting import (
        plot_avg_volume_by_delta,
        plot_greeks_vs_strike,
        plot_iv_time_series_with_slope,
        plot_liquidity_by_dte,
        plot_term_structure_samples,
        plot_smiles_by_delta,
        plot_spot_vs_yahoo,
        plot_term_structures_by_delta,
    )
except ModuleNotFoundError:
    from helpers import QCSummaryHelper
    from plotting import (
        plot_avg_volume_by_delta,
        plot_greeks_vs_strike,
        plot_iv_time_series_with_slope,
        plot_liquidity_by_dte,
        plot_term_structure_samples,
        plot_smiles_by_delta,
        plot_spot_vs_yahoo,
        plot_term_structures_by_delta,
    )
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

1. Read SPY Options data & Daily Features¶

We analyze the full SPY options chain from 2007-01-01 to 2025-12-31 keeping contracts within a broad tradable region.

We also import the Daily Features dataset, which contains complementary metrics used for the analysis over the same period.

In [11]:
TICKER = "SPY"

START = date(2007, 1, 1)
END = date(2025, 12, 5)

DELTA_MIN = 0.01
DELTA_MAX = 0.99
DTE_MIN = 5
DTE_MAX = 252
In [12]:
lf = scan_options_chain(TICKER)
lf = lf.filter(
    pl.col("trade_date").is_between(START, END),
    pl.col("call_delta").abs().is_between(DELTA_MIN, DELTA_MAX),
    pl.col("put_delta").abs().is_between(DELTA_MIN, DELTA_MAX),
    pl.col("dte").is_between(DTE_MIN, DTE_MAX),
)

df = lf.collect()
df_long = options_chain_wide_to_long(df).collect()

df
Out[12]:
shape: (5_210_136, 37)
tickertrade_dateexpiry_datedteyteunderlying_pricespot_pricestrikecall_volumeput_volumecall_open_interestput_open_interestcall_bid_pricecall_mid_pricecall_model_pricecall_ask_pricecall_rel_spreadput_bid_priceput_mid_priceput_model_priceput_ask_priceput_rel_spreadsmoothed_ivcall_mid_ivput_mid_ivcall_deltacall_gammacall_thetacall_vegacall_rhoput_deltaput_gammaput_thetaput_vegaput_rhorisk_free_ratedividend_yield
strdatedatei64f64f64f64f64i64i64i64i64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64
"SPY"2007-01-032007-01-20170.04658141.37141.37132.00111237185799.69.79.749.80.0206190.00.050.030.12.00.1568130.094320.096130.9833010.008686-0.0254840.0123150.060232-0.0166990.008686-0.0254840.0123150.0602320.04840.0
"SPY"2007-01-032007-01-20170.04658141.37141.37133.00701921154438.78.758.778.80.0114290.050.0750.060.10.6666670.1585240.085510.165210.9700730.014062-0.0291740.0223210.059813-0.0299270.014062-0.0291740.0223210.0598130.04840.0
"SPY"2007-01-032007-01-20170.04658141.37141.37134.02201447220327.77.87.87.90.0256410.050.10.090.151.00.1553240.090830.156150.9550520.020038-0.0325580.0315460.059273-0.0449480.020038-0.0325580.0315460.0592730.04840.0
"SPY"2007-01-032007-01-20170.04658141.37141.37135.0635432785350036.76.86.856.90.0294120.10.1250.130.150.40.15110.081240.148180.9347970.02765-0.0363150.0372230.058383-0.0652030.02765-0.0363150.0372230.0583830.04840.0
"SPY"2007-01-032007-01-20170.04658141.37141.37136.01817411681400355.85.855.915.90.0170940.150.1750.190.20.2857140.1456140.130730.14160.9085340.037099-0.0401460.049430.0570904-0.0914660.037099-0.0401460.049430.05709040.04840.0
…………………………………………………………………………………………………
"SPY"2025-12-052026-06-302070.56712685.61685.61850.00020200.480.490.490.50.040816163.16164.63164.39166.10.0178580.1265290.126810.121210.0205040.000756-0.0089030.2656880.07697-0.9794960.000756-0.0089030.2656880.076970.0360.013382
"SPY"2025-12-052026-06-302070.56712685.61685.61855.0905000.420.4250.430.430.023529168.15169.72169.39171.290.0185010.1270160.127350.124840.0179740.000673-0.0079680.2122740.067501-0.9820260.000673-0.0079680.2122740.0675010.0360.013382
"SPY"2025-12-052026-06-302070.56712685.61685.61860.0004900.370.3750.370.380.026667173.14174.61174.39176.080.0168380.1274510.128030.12590.0157660.000599-0.0071230.2146440.059233-0.9842340.000599-0.0071230.2146440.0592330.0360.013382
"SPY"2025-12-052026-06-302070.56712685.61685.61865.0002500.320.3250.330.330.030769178.13179.71179.39181.290.0175840.1279070.128680.129630.0137090.00053-0.0063250.1707810.051526-0.9862910.00053-0.0063250.1707810.0515260.0360.013382
"SPY"2025-12-052026-06-302070.56712685.61685.61870.010236100.280.290.290.30.068966183.13184.695184.39186.260.0169470.1282650.129570.13180.0119360.000468-0.0056110.17270440.044882-0.9880640.000468-0.0056110.17270440.0448820.0360.013382

We also import the Daily Features which conatisn useful metrics used for analysis from the sam period

In [13]:
daily_features = read_daily_features(TICKER)
daily_features = daily_features.filter(
    pl.col("trade_date").is_between(START, END)
)
daily_features = daily_features.to_pandas().set_index("trade_date")

daily_features
Out[13]:
ticker iv_10d iv_20d iv_30d iv_60d iv_90d iv_6m iv_1y iv_dlt25_10d iv_dlt25_20d ... hv_intra_1d hv_intra_5d hv_intra_10d hv_intra_20d hv_intra_30d hv_intra_60d hv_intra_90d hv_intra_100d hv_intra_120d hv_intra_252d
trade_date
2007-01-03 SPY 0.101635 0.102469 0.104870 0.117420 0.124077 0.120905 0.119971 0.091543 0.092130 ... 0.1550 0.0966 0.0886 0.0835 0.0884 0.0871 0.1161 0.1138 0.1179 0.1213
2007-01-04 SPY 0.092746 0.094150 0.097141 0.109366 0.122466 0.120715 0.122350 0.085077 0.085406 ... 0.0993 0.1002 0.0905 0.0825 0.0899 0.0878 0.1165 0.1138 0.1168 0.1212
2007-01-05 SPY 0.107158 0.107713 0.108645 0.121309 0.121143 0.123434 0.102399 0.095921 0.096610 ... 0.1029 0.1042 0.0915 0.0845 0.0913 0.0886 0.1168 0.1141 0.1167 0.1212
2007-01-08 SPY 0.104130 0.104633 0.105134 0.118536 0.119770 0.120721 0.122504 0.093455 0.091858 ... 0.0901 0.1093 0.0948 0.0864 0.0926 0.0891 0.1168 0.1140 0.1168 0.1213
2007-01-09 SPY 0.107480 0.107693 0.107879 0.114870 0.126901 0.121019 0.110381 0.101403 0.099474 ... 0.0816 0.1088 0.0947 0.0860 0.0934 0.0891 0.1168 0.1135 0.1163 0.1211
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2025-12-01 SPY 0.131834 0.147691 0.131681 0.144273 0.150576 0.161976 0.170348 0.115265 0.129748 ... 0.0738 0.1336 0.1873 0.1658 0.1476 0.1343 0.1247 0.1206 0.1173 0.1930
2025-12-02 SPY 0.132815 0.139302 0.132822 0.143952 0.149011 0.159100 0.166644 0.116351 0.125871 ... 0.0652 0.1089 0.1788 0.1654 0.1464 0.1344 0.1248 0.1206 0.1172 0.1930
2025-12-03 SPY 0.126530 0.133172 0.130965 0.142168 0.147754 0.155886 0.169044 0.113659 0.118831 ... 0.0800 0.0774 0.1737 0.1635 0.1469 0.1346 0.1250 0.1208 0.1172 0.1930
2025-12-04 SPY 0.125817 0.130076 0.131896 0.141580 0.147061 0.159405 0.167945 0.111157 0.119376 ... 0.0567 0.0692 0.1698 0.1625 0.1453 0.1346 0.1249 0.1205 0.1165 0.1930
2025-12-05 SPY 0.112177 0.123156 0.124830 0.137596 0.145259 0.155761 0.165781 0.100875 0.110574 ... 0.0559 0.0670 0.1243 0.1590 0.1447 0.1341 0.1247 0.1202 0.1162 0.1930

4763 rows × 32 columns

2. Load QC summary results¶

The checks below are read from the quality checks summary after running orats-api-download --config config/orats_api_download.yml

In [14]:
qc_summary_path = (
     PROC_ORATS_OPTIONS_CHAIN / f"underlying={TICKER}" / "qc_summary.json"
)

with qc_summary_path.open(encoding="utf-8") as f:
    qc_summary = json.load(f)

qc_helpers = QCSummaryHelper(qc_summary)

len(qc_summary), qc_summary[3]
Out[14]:
(68,
 {'name': 'negative_quotes',
  'severity': 'HARD',
  'grade': 'OK',
  'passed': True,
  'n_rows': 15073890,
  'n_units': None,
  'n_viol': 0,
  'viol_rate': 0.0,
  'details': {}})

3. GLOBAL vs ROI Interpretation Policy¶

We use this interpretation policy throughout the notebook for every QC family.

  • GLOBAL: the full options universe after broad filters.
  • ROI: our practical trading region of interest, roughly:
    • moneyness around 10-90 delta,
    • maturity around 10-60 DTE.

Why this split matters:

  • Violations in far wings or extreme maturities can be real but less relevant to tradable workflows.
  • Persistent violations inside ROI are more likely to affect execution quality, strategy risk sizing, and signal reliability.

Severity policy used in sections below:

  1. HARD checks: structural invalid data (drop-candidate rows).
  2. SOFT checks: investigate rate + location first (GLOBAL vs ROI).
  3. INFO diagnostics: descriptive metrics, not pass/fail by themselves.

Liquidity by moneyness: volume by $\Delta$¶

In [15]:
plot_avg_volume_by_delta(df_long)
No description has been provided for this image

Deep OTM puts are usually much more traded than symmetric OTM calls, largely reflecting structural hedging demand.

Liquidity by maturity: volume/open interest by DTE¶

In [16]:
plot_liquidity_by_dte(df_long)
No description has been provided for this image

Short maturities often carry higher traded volume relative to open interest, while longer maturities tend to accumulate open interest with lower turnover.

4. Basic Checks¶

Hard structural checks + calendar-level dataset checks from the QC summary.

In [17]:
df.describe(percentiles=(0.25, 0.5, 0.75, 0.9))
Out[17]:
shape: (10, 38)
statistictickertrade_dateexpiry_datedteyteunderlying_pricespot_pricestrikecall_volumeput_volumecall_open_interestput_open_interestcall_bid_pricecall_mid_pricecall_model_pricecall_ask_pricecall_rel_spreadput_bid_priceput_mid_priceput_model_priceput_ask_priceput_rel_spreadsmoothed_ivcall_mid_ivput_mid_ivcall_deltacall_gammacall_thetacall_vegacall_rhoput_deltaput_gammaput_thetaput_vegaput_rhorisk_free_ratedividend_yield
strstrstrstrf64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64
"count""5210136""5210136""5210136"5.210136e65.210136e65.210136e65.210136e65.210136e65.210136e65.210136e65.210136e65.210136e65.210136e65.210136e65.210136e65.210136e65.209951e65.210136e65.210136e65.210136e65.210136e65.210018e65.210136e65.210136e65.210136e65.210136e65.210136e65.210136e65.210136e65.210136e65.210136e65.210136e65.210136e65.210136e65.210136e65.210136e65.210097e6
"null_count""0""0""0"0.00.00.00.00.00.00.00.00.00.00.00.00.0185.00.00.00.00.0118.00.00.00.00.00.00.00.00.00.00.00.00.00.00.039.0
"mean"null"2019-09-18 16:23:26.956593""2019-12-13 23:21:35.402346"86.2903760.236412351.807057351.807057338.69139504.990066835.6066313604.0815936367.60648725.6897425.84651625.80768826.0032910.03404411.94379312.0608112.06888412.1778270.0263950.2115130.2119120.2055590.5971340.013469-0.0737430.3686230.377437-0.4028660.013469-0.0737430.3686230.3774370.0191040.019355
"std"nullnullnull70.9640360.194422149.991977149.991977149.2724482934.8915534246.78698312448.23732419997.19559530.79103730.96367330.95129231.137310.09657617.74319317.93007317.9345918.1184070.0523730.0944230.096280.0996860.3237840.0152280.0710.3282580.4487730.3237840.0152280.0710.3282580.4487730.0200730.026415
"min""SPY""2007-01-03""2007-01-20"5.00.013768.1168.1148.00.00.00.00.00.00.00.00.00.00.00.00.010.00.00.010.00.00.010.000116-1.4018560.00.0-0.990.000116-1.4018560.00.00.00.0
"25%"null"2017-01-13""2017-03-31"25.00.06849226.66226.66216.00.00.021.044.03.183.213.213.250.0069561.551.5651.581.580.0083060.1435010.142360.139120.3082030.004159-0.0986670.1269520.074896-0.6917970.004159-0.0986670.1269520.0748960.0010.0
"50%"null"2020-10-16""2020-12-31"64.00.17534344.45344.45327.05.020.0342.0609.014.6114.71514.6814.810.0121345.45.4355.425.480.014990.1946780.19360.193940.6970720.0084486-0.0529770.2655090.203721-0.3029280.0084486-0.0529770.2655090.2037210.01080.0163723
"75%"null"2023-02-24""2023-05-19"140.00.38356447.5447.5442.0100.0227.02004.03472.036.1236.33536.2236.550.02375314.8114.9414.9415.060.0271740.257940.259490.257260.8865520.017199-0.0263230.50853640.518745-0.1134480.017199-0.0263230.50853640.5187450.04150.025041
"90%"null"2024-12-20""2025-03-21"200.00.54795572.09572.09550.0713.01280.07803.014721.068.6869.0669.069.440.06451630.4330.7530.8231.070.0495870.3278980.33240.32430.9553630.030165-0.0138810.84981710.973513-0.0446370.030165-0.0138810.84981710.9735130.05150.0407233
"max""SPY""2025-12-05""2026-06-30"252.00.69041687.68687.68870.0500544.0532902.0565321.0704449.0216.94218.5218.64220.062.0199.7201.09200.62202.482.01.4315361.493221.431270.990.550450.02.2498183.0882256-0.010.550450.02.2498183.08822560.06020.682527

Hard key integrity checks (non-negotiable)¶

Policy:

  • Keys required to identify one contract observation should never be null.
  • Any material violation is a data-integrity blocker.
In [18]:
df.with_columns(qT = pl.col("yte") * pl.col("dividend_yield")).filter(pl.col("qT") > 0.05)
Out[18]:
shape: (5_926, 38)
tickertrade_dateexpiry_datedteyteunderlying_pricespot_pricestrikecall_volumeput_volumecall_open_interestput_open_interestcall_bid_pricecall_mid_pricecall_model_pricecall_ask_pricecall_rel_spreadput_bid_priceput_mid_priceput_model_priceput_ask_priceput_rel_spreadsmoothed_ivcall_mid_ivput_mid_ivcall_deltacall_gammacall_thetacall_vegacall_rhoput_deltaput_gammaput_thetaput_vegaput_rhorisk_free_ratedividend_yieldqT
strdatedatei64f64f64f64f64i64i64i64i64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64
"SPY"2009-02-232009-09-192080.5698674.6574.6553.00011340423.323.6523.4624.00.0295982.612.672.732.730.0449440.52510.539310.52060.8541990.007928-0.0257730.1235320.221401-0.1458010.007928-0.0257730.1235320.2214010.00690.0907360.051707
"SPY"2009-02-232009-09-192080.5698674.6574.6554.00010291322.522.82522.723.150.0284782.82.8652.922.930.0453750.5199090.529260.515890.8449030.008345-0.026310.1343020.22164-0.1550970.008345-0.026310.1343020.221640.00690.0907360.051707
"SPY"2009-02-232009-09-192080.5698674.6574.6555.001310285421.722.02521.9422.350.0295123.03.0753.123.150.048780.5146170.520420.511260.8352010.008769-0.0268140.1361640.222001-0.1647990.008769-0.0268140.1361640.2220010.00690.0907360.051707
"SPY"2009-02-232009-09-192080.5698674.6574.6556.00010256920.9521.2521.1921.550.0282353.23.2753.333.350.0458020.509550.512870.505760.8252830.009205-0.0273210.1465260.2219-0.1747170.009205-0.0273210.1465260.22190.00690.0907360.051707
"SPY"2009-02-232009-09-192080.5698674.6574.6557.00010170020.1520.4520.4620.750.029343.43.53.563.60.0571430.5042710.503260.500610.8147340.009639-0.0277620.1485310.221782-0.1852660.009639-0.0277620.1485310.2217820.00690.0907360.051707
……………………………………………………………………………………………………
"SPY"2009-06-182009-12-311960.5369992.2292.22115.000193726070.360.4150.390.470.2650624.224.5527.4224.90.0285130.1938850.195890.00.0728840.010857-0.0060750.0963020.03292-0.9271160.010857-0.0060750.0963020.032920.00520.09847890.052882
"SPY"2009-06-182009-12-311960.5369992.2292.22116.00060112760.30.350.340.40.28571425.325.57528.4225.850.0215050.1926020.194240.00.0636680.009837-0.0054020.0821430.028769-0.9363320.009837-0.0054020.0821430.0287690.00520.09847890.052882
"SPY"2009-06-182009-12-311960.5369992.2292.22117.00034326890.240.290.280.340.34482826.226.52529.4226.850.0245050.1912920.192090.00.0549850.008835-0.0047570.0696390.024863-0.9450150.008835-0.0047570.0696390.0248630.00520.09847890.052882
"SPY"2009-06-182009-12-311960.5369992.2292.22118.0005467920.190.240.240.290.41666727.1527.47530.4227.80.0236580.1901330.190240.00.0474630.007907-0.0041830.0720110.0215663-0.9525370.007907-0.0041830.0720110.02156630.00520.09847890.052882
"SPY"2009-06-182009-12-311960.5369992.2292.22119.00063811500.150.2050.20.260.53658528.128.42531.4228.750.0228670.1892170.189440.00.0412360.007082-0.0036930.0600610.01873-0.9587640.007082-0.0036930.0600610.018730.00520.09847890.052882
In [19]:
basic_checks = qc_helpers.qc_table(
    [
        "keys_not_null",
    ]
)
basic_checks
Out[19]:
shape: (1, 8)
nameseveritygradepassedn_rowsn_unitsn_violviol_rate
strstrstrbooli64nulli64f64
"keys_not_null""HARD""OK"true15073890null00.0

Calendar-level dataset checks (GLOBAL diagnostics)¶

Policy:

  • Missing exchange sessions and non-trading dates are dataset-level diagnostics.
  • Small, explainable exceptions are investigated before escalation.
In [20]:
basic_checks = qc_helpers.qc_table(
    [
        "keys_not_null",
        "trade_date_leq_expiry_date",
        "GLOBAL_missing_sessions_xnys",
        "GLOBAL_non_trading_dates_present_xnys",
    ]
)
basic_checks
Out[20]:
shape: (4, 8)
nameseveritygradepassedn_rowsn_unitsn_violviol_rate
strstrstrbooli64i64i64f64
"keys_not_null""HARD""OK"true15073890null00.0
"trade_date_leq_expiry_date""HARD""OK"true15073890null00.0
"GLOBAL_missing_sessions_xnys""SOFT""OK"true15073890476300.0
"GLOBAL_non_trading_dates_prese…"SOFT""MILD"true15073890476410.00021
In [21]:
missing_sessions = qc_helpers.qc_details("GLOBAL_missing_sessions_xnys").get("missing_dates", [])
non_trading = qc_helpers.qc_details("GLOBAL_non_trading_dates_present_xnys").get("extra_dates", [])

print("Missing XNYS sessions:", len(missing_sessions))
print("Non-trading dates present:", len(non_trading))
print("Sample missing sessions:", missing_sessions[:5])
print("Sample non-trading dates:", non_trading[:5])
Missing XNYS sessions: 0
Non-trading dates present: 1
Sample missing sessions: []
Sample non-trading dates: ['2018-12-05']

After investgiation the trading date 2018-12-05 coreesponds to George W bush memorial day where the NYSE was closed but here the options market were still opened.

In [22]:
df.filter(pl.col("trade_date") == pl.date(2018, 12, 5))
Out[22]:
shape: (1_248, 37)
tickertrade_dateexpiry_datedteyteunderlying_pricespot_pricestrikecall_volumeput_volumecall_open_interestput_open_interestcall_bid_pricecall_mid_pricecall_model_pricecall_ask_pricecall_rel_spreadput_bid_priceput_mid_priceput_model_priceput_ask_priceput_rel_spreadsmoothed_ivcall_mid_ivput_mid_ivcall_deltacall_gammacall_thetacall_vegacall_rhoput_deltaput_gammaput_thetaput_vegaput_rhorisk_free_ratedividend_yield
strdatedatei64f64f64f64f64i64i64i64i64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64
"SPY"2018-12-052018-12-1050.0137270.32270.32240.006824144230.1130.4730.530.830.023630.110.1150.060.120.0869570.4592150.311840.502440.988110.0021333-0.067930.0092740.032421-0.011890.0021333-0.067930.0092740.0324210.02370.0
"SPY"2018-12-052018-12-1050.0137270.32270.32245.00134014135625.1425.525.5725.860.0282350.140.1450.130.150.0689660.4374420.271530.443930.975150.004205-0.103580.0176650.032616-0.024850.004205-0.103580.0176650.0326160.02370.0
"SPY"2018-12-052018-12-1050.0137270.32270.32246.00410024.1524.50524.624.860.0289740.140.1450.160.150.0689660.4348210.262580.428520.9708620.004833-0.1144830.0214710.032591-0.0291380.004833-0.1144830.0214710.0325910.02370.0
"SPY"2018-12-052018-12-1050.0137270.32270.32247.00510123.1523.5123.6223.870.0306250.150.1550.180.160.0645160.4286440.255040.417580.967140.005421-0.1227140.0257430.0325869-0.032860.005421-0.1227140.0257430.03258690.02370.0
"SPY"2018-12-052018-12-1050.0137270.32270.32248.00660022.1622.5222.6522.880.0319720.150.160.210.170.1250.4240850.247420.404350.9620740.006156-0.133810.0258060.032532-0.0379260.006156-0.133810.0258060.0325320.02370.0
…………………………………………………………………………………………………
"SPY"2018-12-052019-06-282050.56164270.32270.32320.0110638110.390.430.550.470.18604749.9850.69553.6951.410.0282080.1326390.126680.00.0505250.003903-0.0077270.2067320.072968-0.9494750.003903-0.0077270.2067320.0729680.02710.0
"SPY"2018-12-052019-06-282050.56164270.32270.32325.00029800.270.3050.350.340.22950854.9755.68558.6956.40.025680.1312780.12810.00.0345920.002902-0.0055880.1450470.050072-0.9654080.002902-0.0055880.1450470.0500720.02710.0
"SPY"2018-12-052019-06-282050.56164270.32270.32330.000251050.20.2350.220.270.29787259.9660.67563.6961.390.0235680.1302020.131110.00.0233250.002105-0.0039630.1214920.0338733-0.9766750.002105-0.0039630.1214920.03387330.02710.0
"SPY"2018-12-052019-06-282050.56164270.32270.32335.000940200.160.20.140.240.464.9565.66568.6966.380.0217770.1294650.135850.00.0155370.001497-0.002770.07987130.022593-0.9844630.001497-0.002770.07987130.0225930.02710.0
"SPY"2018-12-052019-06-282050.56164270.32270.32340.000152900.130.1650.090.20.42424269.9570.66573.6971.380.0202360.1287170.139810.00.0100220.00103-0.0018740.0503560.01459-0.9899780.00103-0.0018740.0503560.014590.02710.0

As you can see contracts were traded that day so we keep this day in our backtest.

5. Days-to-expiry check¶

Here a HARD error will be that the current trade_date is larger than the exipiry whihc is imposisble as at the latest the trade_date can match the maturity on the expiry date but beyind it is impossible.

We alos check the distributuon of the dte column expecting dte ranging from the tradebale filters we have applied.

In [23]:
dte_checks = qc_helpers.qc_table(["trade_date_leq_expiry_date"])
dte_checks
Out[23]:
shape: (1, 8)
nameseveritygradepassedn_rowsn_unitsn_violviol_rate
strstrstrbooli64nulli64f64
"trade_date_leq_expiry_date""HARD""OK"true15073890null00.0

5.2. DTE distribution sanity (INFO diagnostics)¶

In [24]:
global_dte_stats = qc_helpers.info_stats_metric("GLOBAL_core_numeric_stats", "dte")
print("GLOBAL DTE stats")
display(global_dte_stats)
GLOBAL DTE stats
metric null_rate n_nonnull min max mean std median q_0.00 q_0.01 q_0.05 q_0.50 q_0.95 q_0.99 q_1.00
0 dte 0.0 15073890 1.0 252.0 67.379465 68.617457 36.0 1.0 1.0 3.0 36.0 217.0 245.0 252.0

6. Quote sanity checks¶

Here we separate quote checks into 3 groups:

  1. Hard data errors (drop-candidate rows)

    • negative_quotes: bid or ask below zero (impossible market quotes)
    • crossed_market: bid above ask (invalid quote state)
    • bid_ask_sane: hard guardrail summary for bid/ask consistency
  2. Suspicious but often explainable microstructure cases (investigate first)

    • locked_market: bid equals ask
    • one_sided_quotes: no bid with positive ask These can happen, especially in low-liquidity wings or near close.
  3. Spread quality diagnostics

    • wide_spread and very_wide_spread Not always a data error, but a trading-quality warning. In practice, we care whether they cluster outside the tradable ROI.

Hard quote errors (non-negotiable)¶

In [25]:
hard_quote_checks = qc_helpers.qc_table(
    [
        "bid_ask_sane",
        "negative_quotes",
        "crossed_market",
    ]
)
hard_quote_checks
Out[25]:
shape: (3, 8)
nameseveritygradepassedn_rowsn_unitsn_violviol_rate
strstrstrbooli64nulli64f64
"bid_ask_sane""HARD""OK"true15073890null00.0
"crossed_market""HARD""OK"true15073890null00.0
"negative_quotes""HARD""OK"true15073890null00.0

Locked and one-sided quotes (investigate, then decide)¶

In [26]:
microstructure_quote_checks = qc_helpers.qc_table(
    [
        "GLOBAL_locked_market_C",
        "GLOBAL_locked_market_P",
        "ROI_locked_market_C",
        "ROI_locked_market_P",
        "GLOBAL_one_sided_quotes_C",
        "GLOBAL_one_sided_quotes_P",
        "ROI_one_sided_quotes_C",
        "ROI_one_sided_quotes_P",
    ]
)
microstructure_quote_checks
Out[26]:
shape: (8, 8)
nameseveritygradepassedn_rowsn_unitsn_violviol_rate
strstrstrbooli64nulli64f64
"GLOBAL_locked_market_C""SOFT""OK"true15073890null297880.001976
"GLOBAL_locked_market_P""SOFT""OK"true15073890null247310.001641
"GLOBAL_one_sided_quotes_C""SOFT""WARN"false15073890null5395590.035794
"GLOBAL_one_sided_quotes_P""SOFT""MILD"true15073890null2043440.013556
"ROI_locked_market_C""SOFT""OK"true2678530null630.000024
"ROI_locked_market_P""SOFT""OK"true2678530null1030.000038
"ROI_one_sided_quotes_C""SOFT""OK"true2678530null220.000008
"ROI_one_sided_quotes_P""SOFT""OK"true2678530null100.000004
In [27]:
qc_helpers.qc_top_buckets("GLOBAL_one_sided_quotes_P")
Out[27]:
shape: (5, 7)
dte_bucketdelta_bucketn_violn_rowsviol_rate_bucketviol_sharerow_share
strstri64i64f64f64f64
"(0, 10]""(-inf, 0]"1435483100180.4630310.7024820.041133
"(10, 30]""(-inf, 0]"143861112120.1293570.0704010.014756
"(0, 10]""(0, 0.05]"293713869970.0758950.1437330.051347
"(30, 60]""(-inf, 0]"592215400.0274840.0028970.002858
"(30, 60]""(0, 0.05]"56842517120.0225810.0278160.033397
In [28]:
qc_helpers.qc_top_buckets("ROI_one_sided_quotes_P")
Out[28]:
shape: (5, 7)
dte_bucketdelta_bucketn_violn_rowsviol_rate_bucketviol_sharerow_share
strstri64i64f64f64f64
"(10, 30]""(0.3, 0.7]"52750720.0000180.50.205391
"(10, 30]""(0.1, 0.3]"22852550.0000070.20.212994
"(10, 30]""(0.7, 0.9]"11490960.0000070.10.111327
"(30, 60]""(0.1, 0.3]"12338540.0000040.10.174614
"(30, 60]""(0.3, 0.7]"12426560.0000040.10.181186

Spread diagnostics (execution quality)¶

In [29]:
spread_quote_checks = qc_helpers.qc_table(
    [
        "GLOBAL_wide_spread_C",
        "GLOBAL_wide_spread_P",
        "ROI_wide_spread_C",
        "ROI_wide_spread_P",
        "GLOBAL_very_wide_spread_C",
        "GLOBAL_very_wide_spread_P",
        "ROI_very_wide_spread_C",
        "ROI_very_wide_spread_P",
    ]
)
spread_quote_checks
Out[29]:
shape: (8, 8)
nameseveritygradepassedn_rowsn_unitsn_violviol_rate
strstrstrbooli64nulli64f64
"GLOBAL_very_wide_spread_C""SOFT""OK"true15073890null00.0
"GLOBAL_very_wide_spread_P""SOFT""OK"true15073890null00.0
"GLOBAL_wide_spread_C""SOFT""OK"true15073890null1498860.009943
"GLOBAL_wide_spread_P""SOFT""OK"true15073890null417600.00277
"ROI_very_wide_spread_C""SOFT""OK"true2678530null00.0
"ROI_very_wide_spread_P""SOFT""OK"true2678530null00.0
"ROI_wide_spread_C""SOFT""OK"true2678530null790.000029
"ROI_wide_spread_P""SOFT""OK"true2678530null2340.000087
In [30]:
qc_helpers.qc_top_buckets("GLOBAL_wide_spread_P")
Out[30]:
shape: (5, 7)
dte_bucketdelta_bucketn_violn_rowsviol_rate_bucketviol_sharerow_share
strstri64i64f64f64f64
"(10, 30]""(-inf, 0]"66651112120.0599310.1596020.014756
"(30, 60]""(-inf, 0]"747215400.034680.0178880.002858
"(0, 10]""(-inf, 0]"99163100180.0319850.2374520.041133
"(30, 60]""(0, 0.05]"76702517120.0304710.1836690.033397
"(10, 30]""(0, 0.05]"99913935130.0253890.2392480.052211
In [31]:
qc_helpers.qc_top_buckets("ROI_wide_spread_P")
Out[31]:
shape: (5, 7)
dte_bucketdelta_bucketn_violn_rowsviol_rate_bucketviol_sharerow_share
strstri64i64f64f64f64
"(30, 60]""(0.1, 0.3]"1182338540.0005050.5042740.174614
"(10, 30]""(0.1, 0.3]"742852550.0002590.3162390.212994
"(10, 30]""(0.3, 0.7]"212750720.0000760.0897440.205391
"(0, 10]""(0.3, 0.7]"1133490.0000750.0042740.009967
"(0, 10]""(0.1, 0.3]"1140870.0000710.0042740.010518

7. Volume & Open Interest Checks¶

Here we separate volume/OI checks into 3 groups:

  1. Hard data errors (drop-candidate rows)

    • negative_vol_oi: negative traded volume or open interest (invalid values)
  2. Soft consistency diagnostics (investigate first)

    • zero_vol_pos_oi: positive OI with zero volume
    • pos_vol_zero_oi: positive volume with zero OI These are often explainable by microstructure/timing but can cluster in weak quality regions.
  3. INFO liquidity diagnostics

    • volume_oi_metrics summaries for GLOBAL and ROI scopes Not a pass/fail rule; used to profile tradability and market depth.

Hard volume/OI sign errors (non-negotiable)¶

In [32]:
hard_vol_oi_checks = qc_helpers.qc_table(["negative_vol_oi"])
hard_vol_oi_checks
Out[32]:
shape: (1, 8)
nameseveritygradepassedn_rowsn_unitsn_violviol_rate
strstrstrbooli64nulli64f64
"negative_vol_oi""HARD""OK"true15073890null00.0

Soft volume/OI mismatch diagnostics¶

In [33]:
soft_vol_oi_checks = qc_helpers.qc_table(
    [
        "GLOBAL_zero_vol_pos_oi_C",
        "GLOBAL_zero_vol_pos_oi_P",
        "ROI_zero_vol_pos_oi_C",
        "ROI_zero_vol_pos_oi_P",
        "GLOBAL_pos_vol_zero_oi_C",
        "GLOBAL_pos_vol_zero_oi_P",
        "ROI_pos_vol_zero_oi_C",
        "ROI_pos_vol_zero_oi_P",
    ]
)
soft_vol_oi_checks
Out[33]:
shape: (8, 8)
nameseveritygradepassedn_rowsn_unitsn_violviol_rate
strstrstrbooli64nulli64f64
"GLOBAL_pos_vol_zero_oi_C""SOFT""MILD"true15073890null1936940.01285
"GLOBAL_pos_vol_zero_oi_P""SOFT""MILD"true15073890null2429900.01612
"GLOBAL_zero_vol_pos_oi_C""SOFT""MILD"true15073890null18736030.124295
"GLOBAL_zero_vol_pos_oi_P""SOFT""MILD"true15073890null12603510.083612
"ROI_pos_vol_zero_oi_C""SOFT""MILD"true2678530null739990.027627
"ROI_pos_vol_zero_oi_P""SOFT""MILD"true2678530null800560.029888
"ROI_zero_vol_pos_oi_C""SOFT""OK"true2678530null1041260.038874
"ROI_zero_vol_pos_oi_P""SOFT""OK"true2678530null511960.019113

INFO liquidity metrics (GLOBAL vs ROI)¶

In [34]:
vol_oi_metrics = pd.DataFrame(
    [
        qc_helpers.qc_details("GLOBAL_volume_oi_metrics"),
        qc_helpers.qc_details("ROI_volume_oi_metrics"),
    ],
    index=["GLOBAL_volume_oi_metrics", "ROI_volume_oi_metrics"],
)
vol_oi_metrics
Out[34]:
n_rows volume_null_rate volume_zero_rate oi_null_rate oi_zero_rate
GLOBAL_volume_oi_metrics 15073890 0.0 0.361542 0.0 0.182605
ROI_volume_oi_metrics 2678530 0.0 0.116327 0.0 0.115854
In [35]:
qc_helpers.qc_top_buckets("GLOBAL_zero_vol_pos_oi_P")
Out[35]:
shape: (5, 7)
dte_bucketdelta_bucketn_violn_rowsviol_rate_bucketviol_sharerow_share
strstri64i64f64f64f64
"(180, inf]""(0.1, 0.3]"1050802593460.4051730.0833740.03441
"(60, 180]""(-inf, 0]"4016100300.4003990.0031860.001331
"(180, inf]""(0.7, 0.9]"36599930240.3934360.0290390.012342
"(180, inf]""(0, 0.05]"8803228310.3855720.0069850.003029
"(180, inf]""(0.05, 0.1]"21604586360.3684430.0171410.00778
In [36]:
qc_helpers.qc_top_buckets("GLOBAL_pos_vol_zero_oi_P")
Out[36]:
shape: (5, 7)
dte_bucketdelta_bucketn_violn_rowsviol_rate_bucketviol_sharerow_share
strstri64i64f64f64f64
"(30, 60]""(0.3, 0.7]"213392426560.0879390.0878180.032196
"(0, 10]""(0.95, 1]"344704590520.075090.1418580.060907
"(30, 60]""(0.1, 0.3]"154102338540.0658960.0634180.031028
"(0, 10]""(0.9, 0.95]"1982316970.062530.0081570.004206
"(30, 60]""(0.7, 0.9]"68671182510.0580710.028260.01569

8. Spot price sanity checks¶

We validate spot data in two steps: 8.1. structural consistency checks from the QC summary, 8.2. external cross-check versus Yahoo Finance close (EDA context).

Structural spot consistency checks (QC summary)¶

We check that spot_price is constant across the chain for each trading day. For SPY (equity ETF options), spot_price is also expected to match underlying_price at the day level.

In [37]:
spot_checks = qc_helpers.qc_table(
    [
        "GLOBAL_spot_constant_per_trade_date",
        "GLOBAL_spot_equals_underlying_per_trade_date",
    ]
)
spot_checks
Out[37]:
shape: (2, 8)
nameseveritygradepassedn_rowsn_unitsn_violviol_rate
strstrstrbooli64i64i64f64
"GLOBAL_spot_constant_per_trade…"SOFT""OK"true15073890476400.0
"GLOBAL_spot_equals_underlying_…"SOFT""OK"true15073890476400.0

ORATS SPY vs Yahoo Finance non-adjusted close¶

ORATS options snapshots are taken close to end-of-day, so ORATS spot should be close to a reference close series. We use Yahoo Finance non-adjusted close as a practical benchmark for SPY.

In [38]:
spx_yf = yf.download(TICKER, start=start, end=end, auto_adjust=False)["Close"]
spx_yf = spx_yf.squeeze()
spx_yf.name = "spy_yf_close"

spx_orats = (
    df.group_by("trade_date")
    .agg(pl.col("spot_price").first().alias("spy_orats_spot"))
    .sort("trade_date")
    .to_pandas()
    .set_index("trade_date")
)

spx = pd.concat([spx_yf, spx_orats], axis=1).dropna()

diff = spx["spy_orats_spot"] - spx["spy_yf_close"]
rel_diff = diff / spx["spy_yf_close"]
corr = spx.corr().loc["spy_yf_close", "spy_orats_spot"]

print("Correlation (ORATS spot vs Yahoo close):", corr)
display(
    pd.DataFrame({"abs_diff": diff.abs(), "rel_diff": rel_diff}).describe(
        percentiles=[0.5, 0.9, 0.99]
    )
)
[*********************100%***********************]  1 of 1 completed
Correlation (ORATS spot vs Yahoo close): 0.9999912820675189
/var/folders/4c/lbq7ysyx5zl93htfdcbr02br0000gn/T/ipykernel_19975/84948794.py:13: Pandas4Warning: Sorting by default when concatenating all DatetimeIndex is deprecated.  In the future, pandas will respect the default of `sort=False`. Specify `sort=True` or `sort=False` to silence this message. If you see this warnings when not directly calling concat, report a bug to pandas.
  spx = pd.concat([spx_yf, spx_orats], axis=1).dropna()
abs_diff rel_diff
count 4762.000000 4.762000e+03
mean 0.299533 2.114660e-05
std 0.543123 1.775585e-03
min 0.000000 -2.775000e-02
50% 0.100007 -2.772942e-09
90% 0.830002 1.340869e-03
99% 2.487791 4.921786e-03
max 7.410001 2.151342e-02
In [39]:
plot_spot_vs_yahoo(spx)
No description has been provided for this image

ORATS spot and Yahoo close are very close in level and correlation, which supports the quality of ORATS spot inputs for downstream diagnostics.

9. Dividend Yield checks¶

Dividend yield enters several model-based diagnostics (notably parity bounds through the carry term qT). We first inspect summary stats, then inspect the cross-DTE shape on sample days.

In [40]:
qc_helpers.info_stats_metric("GLOBAL_core_numeric_stats", "dividend_yield")
Out[40]:
metric null_rate n_nonnull min max mean std median q_0.00 q_0.01 q_0.05 q_0.50 q_0.95 q_0.99 q_1.00
0 dividend_yield 0.000846 15061138 0.0 0.98337 0.020372 0.04519 0.012595 0.0 0.0 0.0 0.012595 0.072937 0.187362 0.98337
In [41]:
qc_helpers.info_stats_metric("ROI_core_numeric_stats", "dividend_yield")
Out[41]:
metric null_rate n_nonnull min max mean std median q_0.00 q_0.01 q_0.05 q_0.50 q_0.95 q_0.99 q_1.00
0 dividend_yield 0.0 2678530 0.0 0.568204 0.019147 0.031293 0.0 0.0 0.0 0.0 0.0 0.080451 0.131615 0.568204

Typical levels (mean/median near ~2%) are plausible for SPY, while the upper tail is large in unconditional stats. The term-structure view below is used to separate structural ex-dividend effects from true outlier/noise behavior.

Term-structure of dividend yield¶

We overlay Yahoo Finance ex-dividend dates as dashed vertical markers in DTE space for each sample day. This helps check whether sharp jumps in the dividend_yield curve align with upcoming ex-dividend events.

In [42]:
sample_days = [
    date(2007, 1, 3),
    date(2012, 6, 15),
    date(2018, 12, 24),
    date(2025, 1, 3),
]

spy_dividends = yf.Ticker(TICKER).dividends
ex_div_dates = sorted(
    {
        pd.Timestamp(ts).date()
        for ts in spy_dividends.index
        if start <= pd.Timestamp(ts).date() <= end
    }
)

plot_term_structure_samples(
    df,
    sample_days=sample_days,
    value_col="dividend_yield",
    ex_div_dates=ex_div_dates,
)
No description has been provided for this image

In these sample days, the main dividend-yield jumps align with ex-dividend anchors, which supports a structural carry interpretation rather than random data corruption.

10. Risk-free Rate checks¶

Risk-free rates drive discounting in price bounds and parity diagnostics, so we validate both structural consistency and level behavior.

10.1. Structural uniqueness check (per day-expiry bucket)¶

We expect one consistent risk-free rate per (trade_date, expiry_date) slice. In this run, the check passes with 0 violations, which supports internal consistency of rate assignment.

In [43]:
rf_checks = qc_helpers.qc_table(
    [
        "GLOBAL_unique_risk_free_rate_per_day_expiry",
    ]
)
rf_checks
Out[43]:
shape: (1, 8)
nameseveritygradepassedn_rowsn_unitsn_violviol_rate
strstrstrbooli64i64i64f64
"GLOBAL_unique_risk_free_rate_p…"SOFT""OK"true150738907995000.0

10.2. INFO risk-free metrics (GLOBAL vs ROI)¶

The metrics indicate:

  • no missing values (r_null_rate = 0.0) in both GLOBAL and ROI,
  • plausible range (r_min = 0.0, r_max = 0.0602),
  • similar central levels across scopes (GLOBAL mean/median around 1.97%/1.23%, ROI around 1.85%/1.10%).

This suggests rates are well-covered and in a realistic magnitude range for the sample horizon.

In [44]:
pd.DataFrame(
    [
        qc_helpers.qc_details("GLOBAL_risk_free_rate_metrics"),
        qc_helpers.qc_details("ROI_risk_free_rate_metrics"),
    ],
    index=["GLOBAL_risk_free_rate_metrics", "ROI_risk_free_rate_metrics"],
)
Out[44]:
n_rows r_null_rate r_min r_max r_mean r_median
GLOBAL_risk_free_rate_metrics 15073890 0.0 0.0 0.0602 0.019690 0.0123
ROI_risk_free_rate_metrics 2678530 0.0 0.0 0.0602 0.018491 0.0110

Term-structure sanity on sample days¶

We inspect several dates to verify that cross-DTE rate curves are smooth and economically coherent across rate regimes.

In [45]:
sample_days = [
    date(2007, 1, 3),
    date(2012, 6, 15),
    date(2018, 12, 24),
    date(2025, 1, 3),
]

plot_term_structure_samples(df, sample_days=sample_days, value_col="risk_free_rate")
No description has been provided for this image

The step-like profile is consistent with ORATS using a small set of tenor anchors (piecewise term-structure) rather than a fully free curve at every DTE. Across sample days, levels shift with macro regimes while preserving coherent monotone/near-monotone shape by maturity.

11. Implied Volatility Quality Checks¶

We split IV diagnostics into one HARD validity check and two SOFT tail checks.

  • iv_non_negative is a HARD check: implied volatility should never be negative in clean data.

  • GLOBAL_high_iv uses a 100% IV threshold (smoothed_iv > 1.0). This can occur during extreme stress regimes (for example, crisis windows).

  • GLOBAL_very_high_iv uses a 200% IV threshold (smoothed_iv > 2.0). This is much stricter and should be rare.

In [46]:
iv_checks = qc_helpers.qc_table(
    [
        "iv_non_negative",
        "GLOBAL_high_iv",
        "GLOBAL_very_high_iv",
    ]
)
iv_checks
Out[46]:
shape: (3, 8)
nameseveritygradepassedn_rowsn_unitsn_violviol_rate
strstrstrbooli64nulli64f64
"iv_non_negative""HARD""OK"true15073890null00.0
"GLOBAL_high_iv""SOFT""OK"true15073890null93780.000622
"GLOBAL_very_high_iv""SOFT""OK"true15073890null00.0
In [47]:
qc_helpers.info_stats_metric("GLOBAL_core_numeric_stats", "smoothed_iv")
Out[47]:
metric null_rate n_nonnull min max mean std median q_0.00 q_0.01 q_0.05 q_0.50 q_0.95 q_0.99 q_1.00
0 smoothed_iv 0.0 15073890 0.01 1.95156 0.206395 0.098493 0.188129 0.01 0.072138 0.092514 0.188129 0.378108 0.520645 1.95156

Smile Shapes¶

We use delta as the moneyness measure so that calls and puts can be placed on a single, continuous implied-volatility curve.

ORATS provides a smoothed implied-volatility surface (SMV) that is shared across calls and puts. This lets us analyse the smile consistently across option types and maturities (e.g., 10, 30, and 60 DTE).

In [48]:
picked_dates = [
    date(2008, 10, 10),
    date(2010, 12, 2),
    date(2013, 6, 13),
    date(2015, 8, 24),  
    date(2018, 2, 5),  
    date(2018, 9, 12),
    date(2020, 3, 16), 
    date(2022, 6, 16), 
    date(2025, 3, 3),
]

event_labels = {
    date(2008, 10, 10): "GFC stress",
    date(2015, 8, 24): "China / flash crash",
    date(2018, 2, 5): "Volmageddon",
    date(2020, 3, 16): "Covid crash",
    date(2022, 6, 16): "Rates/Inflation stress",
}

plot_smiles_by_delta(df, picked_dates=picked_dates, event_labels=event_labels)
No description has been provided for this image

Because the IV values we plot are already SMV-smoothed, we do not apply any additional smoothing. A simple interpolation across reported points is sufficient when we want to visualise a continuous curve. In practice, if we need IV at a specific target (e.g., a strike or delta bucket), we either:

  • select the closest available quote (nearest neighbour), or
  • linearly interpolate between adjacent quotes (if the target lies between them).

For details on the ORATS smoothing methodology (SMV system), see: https://orats.com/blog/smoothing-options-implied-volatilities-using-orats-smv-system

Term-Structure Shapes¶

This figure plots implied-volatility term structures (Smoothed IV vs DTE) for a few delta buckets across multiple trade dates (one facet per date).

In [49]:
plot_term_structures_by_delta(df, picked_dates=picked_dates, event_labels=event_labels)
No description has been provided for this image

On crash / stress dates, the term structure often becomes inverted (backwardation): short-dated IV rises relative to longer-dated IV. In more normal regimes, it is typically upward sloping (contango), with longer-dated IV above short-dated IV.

Implied Volatility Time-Series¶

The figure shows the evolution of smoothed implied volatility over time for multiple maturities (e.g., 10D, 30D, 90D, 1Y).

Each series reflects the market’s forward-looking risk expectations over a different horizon:

  • short-dated IV (10–30D) captures near-term uncertainty
  • medium maturities (90D) reflect quarterly risk
  • long-dated IV (1Y) embeds structural / long-run expectations
In [50]:
plot_iv_time_series_with_slope(daily_features, event_labels=event_labels)
No description has been provided for this image

Short-dated IV is more reactive than longer maturities, especially during crises, as markets price immediate uncertainty over the next weeks.

Longer maturities (e.g., 1Y) also rise but more smoothly since they embed expectations over a full year, including the anticipated post-shock recovery.

The lower panel (10D - 1Y) is a compact slope diagnostic:

  • positive values indicate short-dated stress dominance (backwardation),
  • negative values indicate a more normal contango-like term shape.

12. Greeks Sanity Checks¶

We split Greeks QC into HARD data errors and SOFT diagnostics.

  1. HARD sign errors (drop-candidate rows)

    • gamma_non_negative
    • vega_non_negative These are treated as structural issues in a clean options chain. In the pipeline they are HARD checks with a tiny numeric tolerance: violation if gamma < -1e-8 or vega < -1e-8.
  2. SOFT diagnostics (investigate rate and location first)

    • *_delta_bounds_sane_* for calls/puts, GLOBAL and ROI
    • *_theta_positive_* for calls/puts, GLOBAL and ROI

    Delta theoretical bounds are: $$ 0 \le \Delta_C \le 1,\qquad -1 \le \Delta_P \le 0. $$ We still allow small numeric noise at row level (eps=1e-5), then judge the violation rate with soft thresholds.

    Positive theta is also SOFT (row tolerance eps=1e-8) because it can be legitimate in some cases, for example:

    • dividend/carry effects (especially American options),
    • deep ITM contracts with early-exercise effects,
    • very short-maturity edge cases and vendor-Greeks approximation noise.
In [51]:
greeks_checks_cols = [
    "gamma_non_negative",
    "vega_non_negative",
    "GLOBAL_delta_bounds_sane_C",
    "GLOBAL_delta_bounds_sane_P",
    "ROI_delta_bounds_sane_C",
    "ROI_delta_bounds_sane_P",
    "GLOBAL_theta_positive_C",
    "GLOBAL_theta_positive_P",
    "ROI_theta_positive_C",
    "ROI_theta_positive_P",
]

qc_helpers.qc_table(greeks_checks_cols)
Out[51]:
shape: (10, 8)
nameseveritygradepassedn_rowsn_unitsn_violviol_rate
strstrstrbooli64nulli64f64
"gamma_non_negative""HARD""OK"true15073890null00.0
"vega_non_negative""HARD""OK"true15073890null00.0
"GLOBAL_delta_bounds_sane_C""SOFT""OK"true15073890null00.0
"GLOBAL_delta_bounds_sane_P""SOFT""OK"true15073890null00.0
"GLOBAL_theta_positive_C""SOFT""OK"true15073890null00.0
"GLOBAL_theta_positive_P""SOFT""OK"true15073890null00.0
"ROI_delta_bounds_sane_C""SOFT""OK"true2678530null00.0
"ROI_delta_bounds_sane_P""SOFT""OK"true2678530null00.0
"ROI_theta_positive_C""SOFT""OK"true2678530null00.0
"ROI_theta_positive_P""SOFT""OK"true2678530null00.0
In [52]:
qc_helpers.qc_thresholds(greeks_checks_cols)
Out[52]:
shape: (8, 4)
namemildwarnfail
strf64f64f64
"GLOBAL_delta_bounds_sane_C"0.0000010.000010.0001
"GLOBAL_delta_bounds_sane_P"0.0000010.000010.0001
"GLOBAL_theta_positive_C"0.0010.0050.01
"GLOBAL_theta_positive_P"0.0010.0050.01
"ROI_delta_bounds_sane_C"0.0000010.000010.0001
"ROI_delta_bounds_sane_P"0.0000010.000010.0001
"ROI_theta_positive_C"0.0010.0050.01
"ROI_theta_positive_P"0.0010.0050.01

Greeks vs Strike¶

Here we investigate how option Greeks vary with strike for both calls and puts, highlighting the typical theoretical shapes observed around the ATM region and in the wings.

In [53]:
day = date(2024, 12, 16)
dte_target = 30

plot_greeks_vs_strike(df_long, day=day, dte_target=dte_target)
No description has been provided for this image
  • Delta should be monotonic in strike: from near +1 (deep ITM calls) toward 0 (deep OTM calls). For puts, delta typically lies in [-1, 0] under the standard sign convention.
  • Gamma and Vega usually peak near ATM and decay in the wings.
  • Theta is typically negative for long options near ATM, though localized positive-theta pockets can appear (carry/dividend effects and, for American options, early-exercise features).

Note on ORATS conventions: ORATS uses a consistent convention for quoting Greeks across calls and puts (e.g., shared formulations and sign conventions). When comparing call/put surfaces, interpret values according to this convention. See: ORATS – “Option Greeks are the same for calls and puts”.

13. Put-call parity diagnostics¶

Economic context (AOA)¶

In frictionless markets, no-arbitrage implies a strict parity relation for European options:

$$ C_E - P_E = S_0 e^{-qT} - K e^{-rT}. $$

Tradable arbitrage context¶

In live markets, we cannot trade at mid and we pay bid/ask costs. So a small parity gap is often non-actionable. We therefore assess parity with a spread-aware tolerance rather than as an exact equality.

American parity check used in this QC¶

SPY options are American, so we use bounds (not equality):

$$ S_0 e^{-qT} - K \le C_A - P_A \le S_0 - K e^{-rT}. $$

Using mid prices:

$$ L = C_{\text{mid}} - P_{\text{mid}}. $$

Dynamic tolerance:

$$ \tau = \alpha\Big((C_{\text{ask}}-C_{\text{bid}}) + (P_{\text{ask}}-P_{\text{bid}})\Big) + \tau_0. $$

In this pipeline, $\alpha = 1.0$ and $\tau_0 = 0.01$. We flag a violation when:

$$ L < \text{lower} - \tau \quad \text{or} \quad L > \text{upper} + \tau. $$

This is a SOFT data-quality diagnostic: high violations in ROI are more concerning than violations concentrated in illiquid wings.

In [54]:
pcp_checks_cols = [
    "GLOBAL_pcp_bounds_mid_am",
    "ROI_pcp_bounds_mid_am",
]

qc_helpers.qc_table(pcp_checks_cols)
Out[54]:
shape: (2, 8)
nameseveritygradepassedn_rowsn_unitsn_violviol_rate
strstrstrbooli64nulli64f64
"GLOBAL_pcp_bounds_mid_am""SOFT""OK"true7530569null3179650.042223
"ROI_pcp_bounds_mid_am""SOFT""MILD"true1339261null1256910.093851
In [55]:
qc_helpers.qc_top_buckets(pcp_checks_cols[0])
Out[55]:
shape: (5, 7)
dte_bucketdelta_bucketn_violn_rowsviol_rate_bucketviol_sharerow_share
strstri64i64f64f64f64
"(0, 10]""(0.3, 0.7]"148861103310.1349210.0468160.014651
"(30, 60]""(0.7, 0.9]"300142338610.1283410.0943940.031055
"(30, 60]""(0.3, 0.7]"260282426550.1072630.0818580.032223
"(10, 30]""(0.3, 0.7]"289472750710.1052350.0910380.036527
"(10, 30]""(0.7, 0.9]"290492852560.1018350.0913590.03788
In [56]:
qc_helpers.qc_top_buckets(pcp_checks_cols[1])
Out[56]:
shape: (5, 7)
dte_bucketdelta_bucketn_violn_rowsviol_rate_bucketviol_sharerow_share
strstri64i64f64f64f64
"(0, 10]""(0.3, 0.7]"2204133490.1651060.0175350.009967
"(0, 10]""(0.7, 0.9]"1985140870.140910.0157930.010518
"(30, 60]""(0.7, 0.9]"300142338540.1283450.2387920.174614
"(30, 60]""(0.3, 0.7]"260282426550.1072630.2070790.181186
"(10, 30]""(0.3, 0.7]"289472750710.1052350.2303030.20539

A meaningful share of parity violations appears in the (0.3, 0.7] delta bucket, which is closer to the tradable region than far-wing-only issues.

However, this parity check is highly sensitive to the dividend-yield input (q). Because dividend-yield behavior is unstable near some expiries, parity alerts should be interpreted jointly with dividend diagnostics before acting.

14. Price-bounds diagnostics (calls and puts)¶

We monitor two no-arbitrage envelopes and treat violations as SOFT diagnostics.

American-spot bounds¶

Spot-based bounds used as an additional diagnostic:

$$ \max(0, S_0-K) \le C_{\text{mid}} \le S_0, $$ $$ \max(0, K-S_0) \le P_{\text{mid}} \le K. $$

Bounds are evaluated with spread-aware tolerance:

$$ \tau=\max\!\big(\tau_0,\alpha\,|\text{ask}-\text{bid}|\big), $$

with $\alpha=1.0$ and $\tau_0=0.01$ in this pipeline. A row is flagged when:

$$ \text{mid} < \text{lower} - \tau \quad \text{or} \quad \text{mid} > \text{upper} + \tau. $$

Interpretation: small rates outside ROI can be microstructure noise; persistent ROI violations are more concerning for tradable strategies.

In [57]:
bounds_checks = qc_helpers.qc_table(
    [
        "GLOBAL_price_bounds_mid_am_C",
        "GLOBAL_price_bounds_mid_am_P",
        "ROI_price_bounds_mid_am_C",
        "ROI_price_bounds_mid_am_P",
    ]
)
bounds_checks
Out[57]:
shape: (4, 8)
nameseveritygradepassedn_rowsn_unitsn_violviol_rate
strstrstrbooli64nulli64f64
"GLOBAL_price_bounds_mid_am_C""SOFT""FAIL"false15073890null40954030.271689
"GLOBAL_price_bounds_mid_am_P""SOFT""WARN"false15073890null26789400.177721
"ROI_price_bounds_mid_am_C""SOFT""FAIL"false2678530null6546890.244421
"ROI_price_bounds_mid_am_P""SOFT""WARN"false2678530null3941190.14714

A large number of price-bound violations is observed across calls, puts, and the tradeable ROI. To localise the issue, we analyse breaches using DTE × delta buckets.

In [58]:
qc_helpers.qc_top_buckets("ROI_price_bounds_mid_am_C")
Out[58]:
shape: (5, 7)
dte_bucketdelta_bucketn_violn_rowsviol_rate_bucketviol_sharerow_share
strstri64i64f64f64f64
"(30, 60]""(0.1, 0.3]"2338503521060.6641470.3571920.131455
"(10, 30]""(0.1, 0.3]"2852494343520.6567230.4357020.162161
"(0, 10]""(0.1, 0.3]"14087217280.6483340.0215170.008112
"(10, 30]""(0.3, 0.7]"645255501430.1172880.0985580.20539
"(0, 10]""(0.3, 0.7]"3083266980.1154770.0047090.009967
In [59]:
qc_helpers.qc_top_buckets("ROI_price_bounds_mid_am_P")
Out[59]:
shape: (5, 7)
dte_bucketdelta_bucketn_violn_rowsviol_rate_bucketviol_sharerow_share
strstri64i64f64f64f64
"(0, 10]""(0.1, 0.3]"7641217280.3516660.0193880.008112
"(10, 30]""(0.1, 0.3]"1490974343520.3432630.3783050.162161
"(30, 60]""(0.1, 0.3]"1182433521060.3358160.3000190.131455
"(0, 10]""(0.3, 0.7]"3097266980.1160010.0078580.009967
"(30, 60]""(0.3, 0.7]"547584853110.1128310.1389380.181186
  • Violations are heavily concentrated in low-delta wings, particularly in the (0.1, 0.3] bucket across short and medium maturities, where rates exceed ~65%.

  • By contrast, the (0.3, 0.7] region (near-ATM, more tradeable) shows much lower violation rates and a smaller share of total breaches.

Overall, global violation metrics overstate the practical impact: most inconsistencies arise in far-OTM, lower-liquidity regions, not in the core tradable surface.

15. Monotonicity diagnostics¶

We treat monotonicity checks as SOFT diagnostics and separate them into strike- based and maturity-based arbitrage interpretations.

15.1. Vertical spread arbitrage (strike monotonicity)¶

For fixed trade date and expiry, strike monotonicity conditions are:

$$ C(K_1, T) \ge C(K_2, T) \quad \text{for } K_1 < K_2, $$ $$ P(K_1, T) \le P(K_2, T) \quad \text{for } K_1 < K_2. $$

In practice (American exercise + quote noise), we treat violations as SOFT and judge impact using GLOBAL vs ROI concentration.

In [60]:
strike_monotonicity_checks = qc_helpers.qc_table(
    [
        "GLOBAL_strike_monotonicity_C",
        "GLOBAL_strike_monotonicity_P",
        "ROI_strike_monotonicity_C",
        "ROI_strike_monotonicity_P",
    ]
)
strike_monotonicity_checks
Out[60]:
shape: (4, 8)
nameseveritygradepassedn_rowsn_unitsn_violviol_rate
strstrstrbooli64nulli64f64
"GLOBAL_strike_monotonicity_C""SOFT""OK"true15073890null170390.00113
"GLOBAL_strike_monotonicity_P""SOFT""OK"true15073890null136670.000907
"ROI_strike_monotonicity_C""SOFT""OK"true2678530null2560.000096
"ROI_strike_monotonicity_P""SOFT""OK"true2678530null4110.000153
In [61]:
qc_helpers.qc_top_buckets("GLOBAL_strike_monotonicity_P")
Out[61]:
shape: (5, 7)
dte_bucketdelta_bucketn_violn_rowsviol_rate_bucketviol_sharerow_share
strstri64i64f64f64f64
"(10, 30]""(-inf, 0]"17861112120.0160590.130680.014756
"(0, 10]""(-inf, 0]"45383100180.0146380.3320410.041133
"(30, 60]""(-inf, 0]"171215400.0079390.0125120.002858
"(0, 10]""(0, 0.05]"23333869970.0060280.1707030.051347
"(10, 30]""(0, 0.05]"21303935130.0054130.155850.052211

All strike-monotonicity checks show a very small violation rate, and most violations are concentrated in the wings, which limits tradability impact.

15.2. Calendar arbitrage (maturity monotonicity)¶

At fixed strike, maturity monotonicity is:

$$ C(K, T_2) \ge C(K, T_1), \qquad P(K, T_2) \ge P(K, T_1) \quad \text{for } T_2 > T_1. $$

We monitor maturity-order violations as SOFT diagnostics, with emphasis on ROI where tradability impact is highest.

In [62]:
maturity_monotonicity_checks = qc_helpers.qc_table(
    [
        "GLOBAL_maturity_monotonicity_C",
        "GLOBAL_maturity_monotonicity_P",
        "ROI_maturity_monotonicity_C",
        "ROI_maturity_monotonicity_P",
    ]
)
maturity_monotonicity_checks
Out[62]:
shape: (4, 8)
nameseveritygradepassedn_rowsn_unitsn_violviol_rate
strstrstrbooli64nulli64f64
"GLOBAL_maturity_monotonicity_C""SOFT""OK"true15073890null1264910.008391
"GLOBAL_maturity_monotonicity_P""SOFT""MILD"true15073890null3511110.023293
"ROI_maturity_monotonicity_C""SOFT""OK"true2678530null17500.000653
"ROI_maturity_monotonicity_P""SOFT""OK"true2678530null29170.001089

The violation rate is low (less than 2%) across all the checks except the GLOBAL_maturity_monotonicity_P. Thus we inspect the location of those violations usign the Delta x Dte buckets

In [63]:
qc_helpers.qc_top_buckets("GLOBAL_maturity_monotonicity_P").head(10)
Out[63]:
shape: (5, 7)
dte_bucketdelta_bucketn_violn_rowsviol_rate_bucketviol_sharerow_share
strstri64i64f64f64f64
"(10, 30]""(0.95, 1]"1071554287230.249940.3051880.056883
"(0, 10]""(0.95, 1]"1086604590520.2367050.3094750.060907
"(30, 60]""(0.95, 1]"494712313350.213850.1408980.030693
"(60, 180]""(0.95, 1]"496392780940.1784970.1413770.036897
"(180, inf]""(0.95, 1]"6371567710.1122230.0181450.007532

More than 50% of maturity-monotonicity violations are located in very extreme wings, so practical impact on the core tradable surface is limited.

16. Conclusion¶

Overall, the quality checks pass satisfactorily, with most violations concentrated in the far-OTM wings, where liquidity is lower and quotes are inherently noisier.

For the arbitrage checks, the remaining breaches are not treated as critical. The ORATS option chains are generally regarded as high-quality, and these discrepancies are more likely attributable to limitations in the specification of the checks for American options (despite incorporating American put–call inequalities and theoretical bounds) rather than to genuine pricing inconsistencies.