Skip to content

Add benchmark suite for engine and replay throughput#47

Merged
tradingexpert merged 1 commit into
mainfrom
feature/benchmark-suite
May 19, 2026
Merged

Add benchmark suite for engine and replay throughput#47
tradingexpert merged 1 commit into
mainfrom
feature/benchmark-suite

Conversation

@tradingexpert
Copy link
Copy Markdown
Owner

What changed

  • add shared mixed MBO benchmark workloads
  • add direct engine throughput benchmark for Python scalar, C++ scalar, and C++ batch paths
  • add full audited replay throughput benchmark for Python reference and default replay paths
  • document how to interpret direct engine throughput separately from full replay throughput
  • clarify that ordinary audited Replay currently remains event-by-event, while direct C++ batch ingestion is the fast path for callers that own the event loop

Why

After #46, the old scalar-only benchmark story was stale. This PR makes the performance boundary visible: the direct C++ batch path is genuinely faster, while the current audited Replay path still pays for per-event gateway and valuation behavior. It also records the next intended performance step: boundary-batched replay that lets C++ run independently until Python must observe a fill, instruction boundary, or configured inspection point.

Local benchmark results

Using --cycles 20000 --repeats 3 --warmups 1 on this machine:

Direct execution-engine throughput
MatchingEngine scalar          120,000 events    0.0657 s     1,826,506 events/s
CppMatchingEngine scalar       120,000 events    0.0627 s     1,914,635 events/s
CppMatchingEngine batch        120,000 events    0.0069 s    17,517,716 events/s
scalar C++ speedup vs Python     1.05x
batch C++ speedup vs Python      9.59x

Full audited replay throughput
Replay + Python engine         120,000 events    0.2865 s       418,829 events/s
Replay + default engine        120,000 events    0.2855 s       420,242 events/s
default engine speedup vs Python     1.00x

A private research feed-path smoke comparison on the same broad mixed workload shape measured about 1.7M events/s; it is not apples-to-apples with public audited Replay, but it confirms where the public replay bridge still needs work.

Validation

  • python -m ruff check .
  • python -m pytest
  • python -m benchmarks.engine_throughput --cycles 20000 --repeats 3 --warmups 1
  • python -m benchmarks.replay_throughput --cycles 20000 --repeats 3 --warmups 1

@tradingexpert tradingexpert merged commit ed52887 into main May 19, 2026
4 checks passed
@tradingexpert tradingexpert deleted the feature/benchmark-suite branch May 19, 2026 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant