mc-order-execution

Jean-Luc Tchimbakala (jt3654) & Alexis Zecevic (agz2116)

Volatility/volume-based order management system for futures markets. Empirical PDFs of intrawindow price ranges, conditioned on market regime, are used to optimally place limit orders and minimize slippage. It ships a no-lookahead backtester, a Monte Carlo execution layer, an AI-agent execution-value evaluation across all provided markets (see Agentic value), and an interactive Streamlit dashboard for parameter tuning.

Installation

# Editable install with dev dependencies (pytest, ruff, jupyter)
pip install -e ".[dev]"
# Minimal install (runtime only):
pip install -r requirements.txt

Usage

Interactive dashboard (recommended)

streamlit run src/streamlit_app.py     # → http://localhost:8501

A Plotly dashboard with tabs for data quality, ranges, regimes, ePDFs, the no-lookahead backtest, a fill-target Pareto sweep, and an Agent value tab (see Agentic value below). All parameters (τ, half-life, M/N/K, j_start, fill-rate target) are live sidebar sliders — the parameter-tuning surface.

Legacy viewer

python src/app.py                      # → http://localhost:8000

Select a contract, adjust the τ slider, then click Plot.

Agent execution-value evaluation

python scripts/run_agent_eval.py --market Gold --market Nasdaq   # selected markets
python scripts/run_agent_all_assets.py                           # every market, consolidated

Scores the AI agent's fills (AIAgent_*.csv) against benchmarks. The all_assets runner auto-discovers every market with an agent series and writes a cross-market table (reports/agent_all_assets.csv) + figure. See Agentic value below.

End-to-end backtest demo

python scripts/run_v1.py

Loads each market's full contract history (rolled at daily-volume crossover), runs the regime-conditioned strategy on both sides (buy / sell), compares to TWAP and VWAP baselines, prints a slippage + fill-rate summary, and saves histograms to reports/figures/.

Tests

pytest -q

Math primitives (range identity R = R_U + R_D, EWMA recursion vs. brute-force reference, no-lookahead invariant) plus strategy and tick-table unit tests.

Interface

The viewer displays two plots:

Volume chart — 1-minute traded volume over the full contract history. Days are shaded green (kept) or red (discarded) based on the data quality filter described below.

Range distributions — histograms of R, R_U, R_D in ticks for the selected τ, computed only on the kept days.

Tick size is inferred automatically as the minimum non-zero difference between distinct prices observed in the contract history. This is a heuristic, not an exact rule — it works well in practice but can be overridden via the tick_size_override parameter if the inferred value is incorrect.

Methodology

Day filtering

A day is kept if the number of 1-minute bars with at least one trade is at least 90% of the maximum observed across all days for that contract. Days below this threshold are discarded (shown in red) and excluded from all subsequent computations. The reference maximum is the single busiest day in the contract history.

Valid windows

For a given holding period τ (in minutes), the session is sliced into consecutive non-overlapping windows [t, t+τ), [t+τ, t+2τ), … starting from the first bar of the day. A window is valid if and only if:

it contains exactly τ 1-minute bars (no missing bar inside the window), and
it does not cross midnight (the window ends on the same calendar day it starts).

Incomplete windows — those at the end of a session or around intraday gaps — are discarded.

Range quantities

For each valid window the following quantities are computed in price space and in tick space (ℓ = round(value / tick)):

Symbol	Formula	Meaning
R	max(high) − min(low)	Full bar range
R_U	max(high) − open₀	Upward half-range
R_D	open₀ − min(low)	Downward half-range

where open₀ is the open of the first 1-minute bar in the window. By construction R = R_U + R_D (up to rounding artefacts of ±1 tick).

Volatility proxy

Two options are available for estimating the current volatility regime:

EWMA of range — the exponentially weighted moving average of R over past windows. More natural: directly measures the average recent range.
EWMV of range — the exponentially weighted moving variance of R. Captures the stability of volatility rather than its level (volatility of volatility).

Both are computed via the recursive Algorithm 1 from the project spec, using η_{j-1} at step j to avoid any lookahead. The half-life m (in bars) is a user parameter.

Δx discontinuities

Since only a subset of days is retained, consecutive kept days are not necessarily contiguous in calendar time. The first bar of each kept day is therefore assigned Δx = NaN to avoid contaminating the direction signal with overnight or multi-day price jumps. These bars are excluded from the binning threshold computation and do not increment the conditional frequency tables.

Parameters

Parameter	Description	Default
τ	Holding period in minutes	5
half_life	EWMA/EWMV half-life in bars	20
M	Number of volume regime states	3
N	Number of volatility regime states	3
K	Number of price-direction states	3
j_start	Minimum bars before ePDF estimation begins	200
fill_rate_target	Minimum fill probability when picking ℓ*	0.6

Strategy and backtest

At each τ-window decision point:

Classify the prior window into a regime cell (m, n, k) from EWMA-volume, EWMA-range, and Δx quantile states.
Look up the cell's empirical PDF of R_U (for a sell) or R_D (for a buy).
Pick ℓ* = largest tick distance such that P(R ≥ ℓ\*) ≥ fill_rate_target (order_mgmt.strategy.pick_ell_star).
Place a limit order at open ± ℓ\* · tick. If the realized R_U/R_D meets ℓ*, the order fills at the limit price; otherwise it chases at the window's close.

Slippage is reported in ticks vs. a TWAP baseline (market-execute at open). The VWAP baseline is also computed for context.

Results (`scripts/run_v1.py`)

Settings: τ=5, half_life=20, M=N=K=3, j_start=200, fill_rate_target=0.6. Full contract history per market (roll-aware loader). Two backtest variants reported:

v1 — uses ePDFs built from the full history (lookahead permitted; the "what's the maximum edge under perfect knowledge of marginal distributions" upper bound).
v2 — strict no-lookahead. At each decision j, ePDFs and quantile thresholds are built incrementally from data strictly before j. (run_backtest_rolling)

Market	Contracts rolled	Side	Variant	n	Fill rate	Avg (ticks)	Median (ticks)
Gold	GCG24 / GCJ24 / GCM24 / GCQ24	buy	v1	35 437	66.9%	+0.06	+2
Gold	GCG24 / GCJ24 / GCM24 / GCQ24	buy	v2	35 410	66.1%	+0.04	+3
Gold	GCG24 / GCJ24 / GCM24 / GCQ24	sell	v1	35 437	66.9%	+0.19	+3
Gold	GCG24 / GCJ24 / GCM24 / GCQ24	sell	v2	35 410	66.1%	+0.18	+3
Nasdaq	NQH20 / NQM20 / NQU20	buy	v1	39 391	68.6%	−0.17	+6
Nasdaq	NQH20 / NQM20 / NQU20	buy	v2	39 364	71.9%	−0.12	+6
Nasdaq	NQH20 / NQM20 / NQU20	sell	v1	39 391	68.9%	+0.17	+6
Nasdaq	NQH20 / NQM20 / NQU20	sell	v2	39 364	71.3%	+0.12	+7

VWAP baseline averages are ±0.04 ticks on Gold and ±0.03 on Nasdaq — essentially flat against TWAP=open at the τ=5 horizon.

Interpretation. v1 and v2 are within 0.05 ticks on the mean; the lookahead bias is small in this configuration. Both variants show:

Strongly positive median — the typical fill saves +2 to +7 ticks vs. TWAP
Mean near zero — dragged down by the chase-on-unfilled tail
Fill rate 66–72% — close to the 0.6 target

The edge is real but modest. The levers explored from here — a smarter chase policy (cost-aware ℓ*, chase-at-mid, early-chase) and especially the chase-cap that truncates the unfilled tail — are developed and quantified in the Agentic value section below, where they turn the mean positive on the liquid markets.

Known simplifications

VWAP execution assumption. The baseline assumes you can transact at the bar-typical-price (high + low + close) / 3 weighted by bar volume. Optimistic; real VWAP execution has implementation shortfall.
Chase price = window close. Unfilled orders are charged at the close of the τ-window. Real execution might allow earlier intervention or pay half-spread, both of which would tighten the slippage tails.
Tick size from heuristic by default. For known markets the spec value from order_mgmt.ticks.TICK_TABLE overrides the inferred minimum-price- difference heuristic (plot_volume.get_tick). Unknown markets fall back to the heuristic.

Agentic value (AI agent execution)

Each market ships an AIAgent_*.csv — a 5-minute decision series for an "AI agent" (schema excel_serial_day, hour, minute, price, signed_position). The agent decides what and when to trade; this module decides how to fill each parent order. Agent value = how much regime-conditioned execution improves the agent's realised price versus a naïve benchmark, in ticks (src/order_mgmt/agent/).

Parent orders & direction. The agent's trades are the rows where its signed position changes: dpos = position.diff(), dpos > 0 → buy, < 0 → sell. Direction comes from the agent's own action, so it is causal — no lookahead.

Benchmark = the OHLC window open, not the agent's CSV price. The agent price can sit on a different (unrolled) contract than our rolled OHLC series; that basis would otherwise masquerade as slippage (it produced a spurious Gold tail until we switched the benchmark). Positive ticks therefore mean the execution layer beat market-on-decision.

No-lookahead split. The regime ePDFs/thresholds are fit only on OHLC windows before the agent's first trade (train_end); every decision is then evaluated out-of-sample. This is a single time-ordered hold-out — a full in/out-of-sample parameter leaderboard is the natural next step.

The headline result

Posting a passive regime limit ℓ* earns a strong median improvement but a mean near zero — the same chase-on-unfill tail as the main backtest. The robust win is fill rate and the median; the mean is governed by the tail.

The winner — regime-limit + chase-cap. Keep the limit's upside but stop out at a fixed cap ticks of adverse move (order_mgmt.agent.slicing.fill_capped). The asymmetry — uncapped upside, bounded downside — turns the mean positive while keeping the median and shrinking the 5th-percentile tail.

All assets (genericity)

python scripts/run_agent_all_assets.py runs the eval on every market with an AIAgent_*.csv (τ=5, half_life=20, M=N=K=3, j_start=200, fill_rate_target=0.6). Shortfall in ticks vs the OHLC open; median is the honest headline.

Market	tick	n	Fill	Regime mean / median	Best cap	Capped mean / median / p5
Nasdaq	0.25	1568	89%	−1.33 / +5	4	+2.97 / +5 / −4
Gold	0.10	948	76%	+0.01 / +4	4	+0.99 / +3 / −4
JPY	0.005	621	69%	+0.08 / +2	4	+0.14 / +1 / −4
GBP	0.01	369	72%	+0.07 / +2	16	−0.05 / +1 / −7
EuroStoxx	0.50	154	78%	+0.11 / +3	16	+0.21 / +3 / −16
Bunds	0.01	52	79%	+0.35 / +1	10	+0.29 / +1 / −4
HeatingOil	0.01	13	—	low-N (excluded)	—	—

Reading it. Every market shows a positive median — the regime limit reliably beats market-on-decision on the typical fill. The chase-cap's big win is on the liquid, high-volume markets (Nasdaq +2.97t, Gold +0.99t mean); on the FX pairs and EuroStoxx the uncapped mean is already ≈ 0 and a tight cap barely helps (best cap is large), so the cap is a tail-insurance lever, not a free lunch. HeatingOil is excluded — its agent series outruns the OHLC coverage (only 13 fillable decisions), so its numbers are not statistically reliable. The best cap is market-dependent — the dashboard's Agent value tab sweeps it live. (Cross-asset figure: reports/figures/agent_all_assets.png; table: reports/agent_all_assets.csv.)

Tick sizes are in the CSVs' quoting units (order_mgmt.ticks.TICK_TABLE): GBP/JPY/ HeatingOil/EuroStoxx are quoted at a scaled representation, so the data-unit tick (e.g. GBP 0.01, JPY 0.005) — not the raw exchange tick — is what makes ℓ a true count of spreads.

Strategy levers (Stream D)

Composable refinements on top of the picker, surfaced as options (the final parameter choice is the user's):

Cost-aware ℓ* (pick_ell_star_cost_aware) — maximise p(ℓ)·ℓ − (1−p(ℓ))·chase_cost instead of targeting a fill rate; never picks worse than market-on-open.
Chase-at-mid (chase_price(policy="mid")) — fill unfilled orders at the window mid (H+L)/2 rather than the close; strictly better mean and tail.
Early-chase (simulate_early_chase) — bail when price moves a trigger distance against the limit instead of waiting for the deadline (the intrabar cousin of the chase-cap).

Genericity, not over-fitting

A regime-conditioning ablation (scripts/sweep_chase.py, notes/strategy-sweep.md) found the 27-cell edge over a single pooled ePDF is only ~0.02–0.05 ticks of mean — the chase-cap, not the conditioning, is doing the heavy lifting. The pipeline is generic across all provided markets (scripts/run_agent_eval.py accepts any --market), and a synthetic agent generator (order_mgmt.agent.synthetic) provides a zero-shot genericity check.

Full findings: notes/agent-value.md (execution value) and notes/strategy-sweep.md (Stream D sweep + ablation).

Project layout

src/
  streamlit_app.py    Plotly dashboard (port 8501) — data quality, ranges,
                      regimes, ePDFs, backtest, sweep, Agent value tab
  viz_plotly.py       Plotly figure builders for the dashboard
  app.py              Legacy web viewer (port 8000)
  ranges.py           τ-window range computation (compute_ranges, compute_all_ranges)
  epdf.py             Conditional ePDF builder (build_epdf) + raw CSV loader
  plotting.py         Range histogram figure
  plot_volume.py      Daily-volume figure + 90%-of-max liquidity filter + tick inference
  regime.py           EWMA / EWMV recursion + regime visualisation
  order_mgmt/
    loader.py         Contract-roll-aware market loader (load_market, MarketSpec)
    pipeline.py       Bridges load_market into the indexed-by-time format
    ticks.py          Per-market spec tick-size table + resolver
    strategy.py       pick_ell_star + cost-aware/random pickers, chase_price,
                      simulate_early_chase (Stream D)
    baselines.py      TWAP / VWAP baselines
    backtest.py       run_backtest (v1) + run_backtest_rolling (v2, no-lookahead)
    mc/               Monte Carlo execution layer (Stream F)
    agent/            AI-agent execution value (Stream G): loader, metrics,
                      benchmarks, slicing (fill_capped), dynamic (DP), synthetic

tests/                pytest — math primitives, strategy, ticks, pipeline, loader,
                      MC, agent
scripts/
  run_v1.py           End-to-end backtest demo: Gold + Nasdaq, v1 vs v2 vs VWAP
  run_agent_eval.py   Agent execution-value eval per market
  sweep_*.py          Fill-rate Pareto + chase-policy sweeps (Stream D)
  compare_*.py        Agent benchmark / slicing / dynamic / tail-cap comparisons
reports/figures/      Backtest + agent + sweep figures

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github/workflows		.github/workflows
.streamlit		.streamlit
data		data
notebooks		notebooks
notes		notes
plan		plan
reports		reports
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mc-order-execution

Installation

Usage

Interactive dashboard (recommended)

Legacy viewer

Agent execution-value evaluation

End-to-end backtest demo

Tests

Interface

Methodology

Day filtering

Valid windows

Range quantities

Volatility proxy

Δx discontinuities

Parameters

Strategy and backtest

Results (`scripts/run_v1.py`)

Known simplifications

Agentic value (AI agent execution)

The headline result

All assets (genericity)

Strategy levers (Stream D)

Genericity, not over-fitting

Project layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mc-order-execution

Installation

Usage

Interactive dashboard (recommended)

Legacy viewer

Agent execution-value evaluation

End-to-end backtest demo

Tests

Interface

Methodology

Day filtering

Valid windows

Range quantities

Volatility proxy

Δx discontinuities

Parameters

Strategy and backtest

Results (scripts/run_v1.py)

Known simplifications

Agentic value (AI agent execution)

The headline result

All assets (genericity)

Strategy levers (Stream D)

Genericity, not over-fitting

Project layout

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Results (`scripts/run_v1.py`)

Packages