[Bug Report] A deep sweep at one price level crashes `Orderbook.send()` — the trade-log's growth check tops up by a fixed increment instead of the real shortfall


Found while integrating `Surbeivol/PythonMatchingEngine` into an open matching-engine benchmark, the [Matching Engine Performance Challenge](https://github.com/flash1-dev/matching-engine-benchmark) — it cross-checks engines against the byte-identical consensus of other open-source engines. The engine matched consensus on every conformance case I ran except one: a single order sweeping a deep queue resting at one price level crashes `Orderbook.send()` outright, with an uncaught `ValueError` from inside the trade-log bookkeeping. The matching logic itself (`send`/`_sweep_best_price`/`_new_pricelevel`) is fully iterative — plain `while` loops, no recursion anywhere in the crossing path — so this isn't a stack-depth issue; it's an off-by-arithmetic bug in how `Orderbook.trades` grows to hold a deep sweep's fills.

Pinned at current `master` (`f94150294a85d7b415ca4518590b5a661d6f9958`).

## Environment
- Commit: `f94150294a85d7b415ca4518590b5a661d6f9958`
- Python 3.12.3, numpy 2.3.5, pandas 2.3.3 (besides PyYAML, the only third-party packages the engine imports)
- Driven through the engine's own public API only: `marketsimulator.orderbook.Orderbook.send()`

## Mechanism

`Orderbook.__init__` sizes the trade log off a per-ticker "average transactions" constant, not off anything related to how many fills one incoming order can produce:

```python
self.init_size = int(AVG_TRANSACTS[band])
self.inc = int(max(0.1 * AVG_TRANSACTS[band], 10))
```
(`marketsimulator/orderbook.py:83-84`). For the highest-liquidity band (`band6`, e.g. ticker `san`), `config/trades_bands.yml` sets `AVG_TRANSACTS['band6'] = 15000`, so `self.inc = 1500`. `create_stats_dict` allocates `self.trades` at exactly that size:

```python
self.trades = {
    key: np.zeros(self.inc)
    for key in STATS
}
```
(`:120-123`).

Every fill from a single incoming order accumulates into one local counter, `n_newtrd`, inside `_sweep_best_price`'s `while order.leavesqty > 0` loop (`:508-572`) — a deep resting queue at one price can hand a single aggressor thousands of fills in this one loop, all before the method ever returns. Only once the loop finishes does it flush the whole batch in one call:

```python
self.update_trades(pos=n_newtrd)
```
(`:574`). `update_trades` is where the growth bug lives:

```python
def update_trades(self, pos):

    end = self.ntrds + pos
    for stat in STATS:

        end = self.ntrds + pos
        if end < len(self.trades[stat]):
            self.trades[stat][self.ntrds:end
            ] = self.last_trades[stat][:pos]
        else:
            self.trades = self.inc_dict_size(self.trades,
                                             inc=self.inc)
            self.trades[stat][self.ntrds:end
            ] = self.last_trades[stat][:pos]

    self.ntrds += pos
```
(`:446-461`), backed by:
```python
def inc_dict_size(self, stat_dict, inc):
    def_arr = np.zeros(inc)
    def_day = np.full(inc, self.def_day)

    new_dict = {(key): (np.hstack([stat_dict[key], def_arr])
                        if key != 'timestamp' else np.hstack([stat_dict[key], def_day]))
                for key in stat_dict.keys()
                }

    return new_dict
```
(`:151-160`). The `else` branch tops the array up by exactly `self.inc` — one fixed increment — then unconditionally slice-assigns, with no re-check that the new length is actually `>= end`. That's fine as long as `end = self.ntrds + pos` never exceeds `len(self.trades[stat]) + self.inc`; it's wrong the moment a single sweep produces more fills than that. Concretely, on a fresh book (`self.ntrds == 0`, `len(self.trades[stat]) == self.inc == 1500`), the top-up grows the array to `3000`; if `pos > 3000` the destination slice `self.trades[stat][0:pos]` is still only `3000` long (out-of-range numpy slicing silently truncates rather than extending), so the assignment throws.

`update_my_trades` (`:463-477`, the `is_mine`-only mirror of the same bookkeeping) has the identical bug. It isn't independently reproducible below because `update_trades` always runs first (`_sweep_best_price:574`, before the `update_my_trades` call at `:576`) and crashes at the same threshold — but the fix below covers it too.

## Minimal reproduction
Public API only, against a fresh `Orderbook` (band6, `self.inc == 1500`, so the smallest possible overflow is 3,001 fills from one order):

```python
from marketsimulator.orderbook import Orderbook

ob = Orderbook(ticker="san")          # band6 -> self.inc == 1500
for i in range(1, 3502):
    ob.send(is_buy=False, qty=1, price=100.0, uid=i)   # 3,501 resting asks @100

ob.send(is_buy=True, qty=3501, price=100.0, uid=999999)  # one sweep, 3,501 fills
```

Observed:
```
Traceback (most recent call last):
  File "repro.py", line 7, in <module>
    ob.send(is_buy=True, qty=3501, price=100.0, uid=999999)
  File "marketsimulator/orderbook.py", line 313, in send
    self._sweep_best_price(neword)
  File "marketsimulator/orderbook.py", line 574, in _sweep_best_price
    self.update_trades(pos=n_newtrd)
  File "marketsimulator/orderbook.py", line 458, in update_trades
    self.trades[stat][self.ntrds:end
ValueError: could not broadcast input array from shape (3501,) into shape (3000,)
```

The crash is right at the edge: bisecting `n` from 2,999 through 3,001, sweeping 3,000 resting orders in one call is fine, and 3,001 is exactly one too many for the single fixed top-up (`could not broadcast input array from shape (3001,) into shape (3000,)`). Nothing about the input is unusual — an ordinary thin/one-sided book meeting one large marketable order; any simulation feeding this engine real (bursty or thin) order flow for long enough, or one big enough child order from an execution algorithm, can hit it.

## Suggested fix
Grow by the actual shortfall, never less than the usual `self.inc` step so the ordinary/shallow case is unaffected:

```diff
             if end < len(self.trades[stat]):
                 self.trades[stat][self.ntrds:end
                 ] = self.last_trades[stat][:pos]
             else:
+                # A single sweep can produce more fills (pos) than one
+                # self.inc-sized top-up covers (e.g. thousands of resting
+                # orders taken out by one deep aggressor); grow by at least
+                # the actual shortfall so the slice-assign below never sees a
+                # destination shorter than the source.
+                deficit = end - len(self.trades[stat])
                 self.trades = self.inc_dict_size(self.trades,
-                                                 inc=self.inc)
+                                                 inc=max(self.inc, deficit))
                 self.trades[stat][self.ntrds:end
                 ] = self.last_trades[stat][:pos]
```
and the identical change in `update_my_trades`:
```diff
             if end < len(self.my_trades[my_stat]):
                 self.my_trades[my_stat][self.my_ntrds:end
                 ] = self.my_last_trades[my_stat][:pos]
             else:
+                # Same fixed-increment-vs-large-batch shortfall as
+                # update_trades above; grow to cover the real deficit.
+                deficit = end - len(self.my_trades[my_stat])
                 self.my_trades = self.inc_dict_size(self.my_trades,
-                                                    inc=self.inc)
+                                                    inc=max(self.inc, deficit))
                 self.my_trades[my_stat][self.my_ntrds:end
                 ] = self.my_last_trades[my_stat][:pos]
```

Applied both to a scratch copy and reran the repro above: it completes with no exception, `ob.ntrds` lands exactly on the fill count, and `ob.trades['pas_ord'][:ob.ntrds]` comes back as the makers in strict arrival order (1, 2, 3, ..., 3501) with no gaps or duplicates. I also reran at a deeper 5,000-resting/5,000-lot scale with a negative (`is_mine`) aggressor id to exercise `update_my_trades`: `ob.ntrds` and `ob.my_ntrds` both land on 5,000 with no exception, `ob.trades['pas_ord']` is again the makers 1..5000 in order, and `ob.my_trades['my_uid']`/`['vol']` correctly record the aggressor's own id against each of the 5,000 fills. Finally, I ran the repo's own `tests/` (`test_bids.py`, `test_asks.py`, `test_orderbook.py`, `test_pricelevel.py` — 45 tests, all on small inputs that never reach this path): 45 passed both before and after the patch, so the fix doesn't touch the normal case.

Respectfully submitted.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] A deep sweep at one price level crashes `Orderbook.send()` — the trade-log's growth check tops up by a fixed increment instead of the real shortfall #48

Environment

Mechanism

Minimal reproduction

Suggested fix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug Report] A deep sweep at one price level crashes Orderbook.send() — the trade-log's growth check tops up by a fixed increment instead of the real shortfall #48

Description

Environment

Mechanism

Minimal reproduction

Suggested fix

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

[Bug Report] A deep sweep at one price level crashes `Orderbook.send()` — the trade-log's growth check tops up by a fixed increment instead of the real shortfall #48