Found while integrating Surbeivol/PythonMatchingEngine into an open matching-engine benchmark, the Matching Engine Performance Challenge — it cross-checks engines against the byte-identical consensus of other open-source engines. The engine matched consensus on every conformance case I ran except one: a single order sweeping a deep queue resting at one price level crashes Orderbook.send() outright, with an uncaught ValueError from inside the trade-log bookkeeping. The matching logic itself (send/_sweep_best_price/_new_pricelevel) is fully iterative — plain while loops, no recursion anywhere in the crossing path — so this isn't a stack-depth issue; it's an off-by-arithmetic bug in how Orderbook.trades grows to hold a deep sweep's fills.
Pinned at current master (f94150294a85d7b415ca4518590b5a661d6f9958).
Environment
- Commit:
f94150294a85d7b415ca4518590b5a661d6f9958
- Python 3.12.3, numpy 2.3.5, pandas 2.3.3 (besides PyYAML, the only third-party packages the engine imports)
- Driven through the engine's own public API only:
marketsimulator.orderbook.Orderbook.send()
Mechanism
Orderbook.__init__ sizes the trade log off a per-ticker "average transactions" constant, not off anything related to how many fills one incoming order can produce:
self.init_size = int(AVG_TRANSACTS[band])
self.inc = int(max(0.1 * AVG_TRANSACTS[band], 10))
(marketsimulator/orderbook.py:83-84). For the highest-liquidity band (band6, e.g. ticker san), config/trades_bands.yml sets AVG_TRANSACTS['band6'] = 15000, so self.inc = 1500. create_stats_dict allocates self.trades at exactly that size:
self.trades = {
key: np.zeros(self.inc)
for key in STATS
}
(:120-123).
Every fill from a single incoming order accumulates into one local counter, n_newtrd, inside _sweep_best_price's while order.leavesqty > 0 loop (:508-572) — a deep resting queue at one price can hand a single aggressor thousands of fills in this one loop, all before the method ever returns. Only once the loop finishes does it flush the whole batch in one call:
self.update_trades(pos=n_newtrd)
(:574). update_trades is where the growth bug lives:
def update_trades(self, pos):
end = self.ntrds + pos
for stat in STATS:
end = self.ntrds + pos
if end < len(self.trades[stat]):
self.trades[stat][self.ntrds:end
] = self.last_trades[stat][:pos]
else:
self.trades = self.inc_dict_size(self.trades,
inc=self.inc)
self.trades[stat][self.ntrds:end
] = self.last_trades[stat][:pos]
self.ntrds += pos
(:446-461), backed by:
def inc_dict_size(self, stat_dict, inc):
def_arr = np.zeros(inc)
def_day = np.full(inc, self.def_day)
new_dict = {(key): (np.hstack([stat_dict[key], def_arr])
if key != 'timestamp' else np.hstack([stat_dict[key], def_day]))
for key in stat_dict.keys()
}
return new_dict
(:151-160). The else branch tops the array up by exactly self.inc — one fixed increment — then unconditionally slice-assigns, with no re-check that the new length is actually >= end. That's fine as long as end = self.ntrds + pos never exceeds len(self.trades[stat]) + self.inc; it's wrong the moment a single sweep produces more fills than that. Concretely, on a fresh book (self.ntrds == 0, len(self.trades[stat]) == self.inc == 1500), the top-up grows the array to 3000; if pos > 3000 the destination slice self.trades[stat][0:pos] is still only 3000 long (out-of-range numpy slicing silently truncates rather than extending), so the assignment throws.
update_my_trades (:463-477, the is_mine-only mirror of the same bookkeeping) has the identical bug. It isn't independently reproducible below because update_trades always runs first (_sweep_best_price:574, before the update_my_trades call at :576) and crashes at the same threshold — but the fix below covers it too.
Minimal reproduction
Public API only, against a fresh Orderbook (band6, self.inc == 1500, so the smallest possible overflow is 3,001 fills from one order):
from marketsimulator.orderbook import Orderbook
ob = Orderbook(ticker="san") # band6 -> self.inc == 1500
for i in range(1, 3502):
ob.send(is_buy=False, qty=1, price=100.0, uid=i) # 3,501 resting asks @100
ob.send(is_buy=True, qty=3501, price=100.0, uid=999999) # one sweep, 3,501 fills
Observed:
Traceback (most recent call last):
File "repro.py", line 7, in <module>
ob.send(is_buy=True, qty=3501, price=100.0, uid=999999)
File "marketsimulator/orderbook.py", line 313, in send
self._sweep_best_price(neword)
File "marketsimulator/orderbook.py", line 574, in _sweep_best_price
self.update_trades(pos=n_newtrd)
File "marketsimulator/orderbook.py", line 458, in update_trades
self.trades[stat][self.ntrds:end
ValueError: could not broadcast input array from shape (3501,) into shape (3000,)
The crash is right at the edge: bisecting n from 2,999 through 3,001, sweeping 3,000 resting orders in one call is fine, and 3,001 is exactly one too many for the single fixed top-up (could not broadcast input array from shape (3001,) into shape (3000,)). Nothing about the input is unusual — an ordinary thin/one-sided book meeting one large marketable order; any simulation feeding this engine real (bursty or thin) order flow for long enough, or one big enough child order from an execution algorithm, can hit it.
Suggested fix
Grow by the actual shortfall, never less than the usual self.inc step so the ordinary/shallow case is unaffected:
if end < len(self.trades[stat]):
self.trades[stat][self.ntrds:end
] = self.last_trades[stat][:pos]
else:
+ # A single sweep can produce more fills (pos) than one
+ # self.inc-sized top-up covers (e.g. thousands of resting
+ # orders taken out by one deep aggressor); grow by at least
+ # the actual shortfall so the slice-assign below never sees a
+ # destination shorter than the source.
+ deficit = end - len(self.trades[stat])
self.trades = self.inc_dict_size(self.trades,
- inc=self.inc)
+ inc=max(self.inc, deficit))
self.trades[stat][self.ntrds:end
] = self.last_trades[stat][:pos]
and the identical change in update_my_trades:
if end < len(self.my_trades[my_stat]):
self.my_trades[my_stat][self.my_ntrds:end
] = self.my_last_trades[my_stat][:pos]
else:
+ # Same fixed-increment-vs-large-batch shortfall as
+ # update_trades above; grow to cover the real deficit.
+ deficit = end - len(self.my_trades[my_stat])
self.my_trades = self.inc_dict_size(self.my_trades,
- inc=self.inc)
+ inc=max(self.inc, deficit))
self.my_trades[my_stat][self.my_ntrds:end
] = self.my_last_trades[my_stat][:pos]
Applied both to a scratch copy and reran the repro above: it completes with no exception, ob.ntrds lands exactly on the fill count, and ob.trades['pas_ord'][:ob.ntrds] comes back as the makers in strict arrival order (1, 2, 3, ..., 3501) with no gaps or duplicates. I also reran at a deeper 5,000-resting/5,000-lot scale with a negative (is_mine) aggressor id to exercise update_my_trades: ob.ntrds and ob.my_ntrds both land on 5,000 with no exception, ob.trades['pas_ord'] is again the makers 1..5000 in order, and ob.my_trades['my_uid']/['vol'] correctly record the aggressor's own id against each of the 5,000 fills. Finally, I ran the repo's own tests/ (test_bids.py, test_asks.py, test_orderbook.py, test_pricelevel.py — 45 tests, all on small inputs that never reach this path): 45 passed both before and after the patch, so the fix doesn't touch the normal case.
Respectfully submitted.
Found while integrating
Surbeivol/PythonMatchingEngineinto an open matching-engine benchmark, the Matching Engine Performance Challenge — it cross-checks engines against the byte-identical consensus of other open-source engines. The engine matched consensus on every conformance case I ran except one: a single order sweeping a deep queue resting at one price level crashesOrderbook.send()outright, with an uncaughtValueErrorfrom inside the trade-log bookkeeping. The matching logic itself (send/_sweep_best_price/_new_pricelevel) is fully iterative — plainwhileloops, no recursion anywhere in the crossing path — so this isn't a stack-depth issue; it's an off-by-arithmetic bug in howOrderbook.tradesgrows to hold a deep sweep's fills.Pinned at current
master(f94150294a85d7b415ca4518590b5a661d6f9958).Environment
f94150294a85d7b415ca4518590b5a661d6f9958marketsimulator.orderbook.Orderbook.send()Mechanism
Orderbook.__init__sizes the trade log off a per-ticker "average transactions" constant, not off anything related to how many fills one incoming order can produce:(
marketsimulator/orderbook.py:83-84). For the highest-liquidity band (band6, e.g. tickersan),config/trades_bands.ymlsetsAVG_TRANSACTS['band6'] = 15000, soself.inc = 1500.create_stats_dictallocatesself.tradesat exactly that size:(
:120-123).Every fill from a single incoming order accumulates into one local counter,
n_newtrd, inside_sweep_best_price'swhile order.leavesqty > 0loop (:508-572) — a deep resting queue at one price can hand a single aggressor thousands of fills in this one loop, all before the method ever returns. Only once the loop finishes does it flush the whole batch in one call:(
:574).update_tradesis where the growth bug lives:(
:446-461), backed by:(
:151-160). Theelsebranch tops the array up by exactlyself.inc— one fixed increment — then unconditionally slice-assigns, with no re-check that the new length is actually>= end. That's fine as long asend = self.ntrds + posnever exceedslen(self.trades[stat]) + self.inc; it's wrong the moment a single sweep produces more fills than that. Concretely, on a fresh book (self.ntrds == 0,len(self.trades[stat]) == self.inc == 1500), the top-up grows the array to3000; ifpos > 3000the destination sliceself.trades[stat][0:pos]is still only3000long (out-of-range numpy slicing silently truncates rather than extending), so the assignment throws.update_my_trades(:463-477, theis_mine-only mirror of the same bookkeeping) has the identical bug. It isn't independently reproducible below becauseupdate_tradesalways runs first (_sweep_best_price:574, before theupdate_my_tradescall at:576) and crashes at the same threshold — but the fix below covers it too.Minimal reproduction
Public API only, against a fresh
Orderbook(band6,self.inc == 1500, so the smallest possible overflow is 3,001 fills from one order):Observed:
The crash is right at the edge: bisecting
nfrom 2,999 through 3,001, sweeping 3,000 resting orders in one call is fine, and 3,001 is exactly one too many for the single fixed top-up (could not broadcast input array from shape (3001,) into shape (3000,)). Nothing about the input is unusual — an ordinary thin/one-sided book meeting one large marketable order; any simulation feeding this engine real (bursty or thin) order flow for long enough, or one big enough child order from an execution algorithm, can hit it.Suggested fix
Grow by the actual shortfall, never less than the usual
self.incstep so the ordinary/shallow case is unaffected:if end < len(self.trades[stat]): self.trades[stat][self.ntrds:end ] = self.last_trades[stat][:pos] else: + # A single sweep can produce more fills (pos) than one + # self.inc-sized top-up covers (e.g. thousands of resting + # orders taken out by one deep aggressor); grow by at least + # the actual shortfall so the slice-assign below never sees a + # destination shorter than the source. + deficit = end - len(self.trades[stat]) self.trades = self.inc_dict_size(self.trades, - inc=self.inc) + inc=max(self.inc, deficit)) self.trades[stat][self.ntrds:end ] = self.last_trades[stat][:pos]and the identical change in
update_my_trades:if end < len(self.my_trades[my_stat]): self.my_trades[my_stat][self.my_ntrds:end ] = self.my_last_trades[my_stat][:pos] else: + # Same fixed-increment-vs-large-batch shortfall as + # update_trades above; grow to cover the real deficit. + deficit = end - len(self.my_trades[my_stat]) self.my_trades = self.inc_dict_size(self.my_trades, - inc=self.inc) + inc=max(self.inc, deficit)) self.my_trades[my_stat][self.my_ntrds:end ] = self.my_last_trades[my_stat][:pos]Applied both to a scratch copy and reran the repro above: it completes with no exception,
ob.ntrdslands exactly on the fill count, andob.trades['pas_ord'][:ob.ntrds]comes back as the makers in strict arrival order (1, 2, 3, ..., 3501) with no gaps or duplicates. I also reran at a deeper 5,000-resting/5,000-lot scale with a negative (is_mine) aggressor id to exerciseupdate_my_trades:ob.ntrdsandob.my_ntrdsboth land on 5,000 with no exception,ob.trades['pas_ord']is again the makers 1..5000 in order, andob.my_trades['my_uid']/['vol']correctly record the aggressor's own id against each of the 5,000 fills. Finally, I ran the repo's owntests/(test_bids.py,test_asks.py,test_orderbook.py,test_pricelevel.py— 45 tests, all on small inputs that never reach this path): 45 passed both before and after the patch, so the fix doesn't touch the normal case.Respectfully submitted.