Set default embed batch size to 32#1897
Conversation
Greptile SummaryThis PR lowers the default
|
| Filename | Overview |
|---|---|
| nemo_retriever/src/nemo_retriever/harness/config.py | Changes embed_batch_size default from 256 to 32 in HarnessConfig; straightforward one-line change aligned with PR intent. |
| nemo_retriever/src/nemo_retriever/params/models.py | Changes embed_batch_size default from 256 to 32 in BatchTuningParams; straightforward one-line change consistent with the harness config change. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[User / Harness Run] --> B{embed_batch_size source?}
B -->|Named preset in test_configs.yaml| C["Explicit preset value\n(still 256 in 8 presets)"]
B -->|No preset override| D["HarnessConfig default\n256 → 32 ✅"]
B -->|BatchTuningParams| E["BatchTuningParams default\n256 → 32 ✅"]
C --> F[Embed Workers process batch at 256]
D --> G[Embed Workers process batch at 32]
E --> G
Prompt To Fix All With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/harness/config.py
Line: 109
Comment:
**Preset configs in `test_configs.yaml` still set `embed_batch_size: 256`**
`nemo_retriever/harness/test_configs.yaml` contains 8 preset entries (e.g. `PE_GE_OCR_TE_DENSE`, `PE_GE_OCR_TE_HYBRID`, …) each explicitly setting `embed_batch_size: 256`. These explicit preset values override the default and would still exercise the old batch size for anyone running harness benchmarks against these named presets. If the goal is to move the whole product surface away from 256, those preset entries likely need to be updated (or the explicit overrides removed so the new default takes effect).
How can I resolve this? If you propose a fix, please make it concise.Reviews (2): Last reviewed commit: "Merge branch 'main' into codex/default-e..." | Re-trigger Greptile
Summary
This replaces the larger embed batching/decoupling direction with a minimal coupled-default change.
Changes:
BatchTuningParams.embed_batch_sizedefault from256to32embed_batch_sizedefault from256to32This keeps the product surface coupled while moving the default away from the larger embed batch regime.
Why
Recent bo767 simple-run benchmarking did not support keeping the more complex decoupling work as a throughput win.
Key learnings:
64/64was the clearly bad regime and produced embed OOMs64/32did not outperform32/32in the simple runner32/32baseline, PR 1823 was near parity withupstream/main, not a clear speedupGiven that, the smallest defensible change is to keep coupling and lower the default.
Benchmark Notes
bo767 simple
graph_pipeline <dataset> --embed-batch-size 32, 3 runs each:upstream_main_32_32median ingest PPS:126.775pr1823_32_32median ingest PPS:125.584This supported a simpler default-tuning PR instead of retaining the extra decoupling control surface.
Validation
Focused harness tests on this branch:
test_harness_config.pytest_harness_run.pyResult:
test_harness_run.pyfailures are already present on cleanupstream/mainand are unrelated to this change