Skip to content

Set default embed batch size to 32#1897

Merged
jdye64 merged 2 commits intoNVIDIA:mainfrom
jioffe502:codex/default-embed-batch-32
Apr 22, 2026
Merged

Set default embed batch size to 32#1897
jdye64 merged 2 commits intoNVIDIA:mainfrom
jioffe502:codex/default-embed-batch-32

Conversation

@jioffe502
Copy link
Copy Markdown
Collaborator

Summary

This replaces the larger embed batching/decoupling direction with a minimal coupled-default change.

Changes:

  • set BatchTuningParams.embed_batch_size default from 256 to 32
  • set harness embed_batch_size default from 256 to 32

This keeps the product surface coupled while moving the default away from the larger embed batch regime.

Why

Recent bo767 simple-run benchmarking did not support keeping the more complex decoupling work as a throughput win.

Key learnings:

  • 64/64 was the clearly bad regime and produced embed OOMs
  • 64/32 did not outperform 32/32 in the simple runner
  • on the simplest apples-to-apples 32/32 baseline, PR 1823 was near parity with upstream/main, not a clear speedup

Given that, the smallest defensible change is to keep coupling and lower the default.

Benchmark Notes

bo767 simple graph_pipeline <dataset> --embed-batch-size 32, 3 runs each:

  • upstream_main_32_32 median ingest PPS: 126.775
  • pr1823_32_32 median ingest PPS: 125.584

This supported a simpler default-tuning PR instead of retaining the extra decoupling control surface.

Validation

Focused harness tests on this branch:

  • test_harness_config.py
  • test_harness_run.py

Result:

  • the branch-specific default change itself behaved as expected
  • two test_harness_run.py failures are already present on clean upstream/main and are unrelated to this change

@jioffe502 jioffe502 requested review from a team as code owners April 21, 2026 20:24
@jioffe502 jioffe502 requested a review from jperez999 April 21, 2026 20:24
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 21, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 21, 2026

Greptile Summary

This PR lowers the default embed_batch_size from 256 to 32 in both HarnessConfig and BatchTuningParams, backed by benchmark data showing 32/32 is near parity with 64/32 while avoiding embed OOMs at higher batch sizes.

Confidence Score: 5/5

Safe to merge — minimal two-line default change with no logic or API surface impact.

Both changes are single-line default value updates in well-understood config structs. No logic, no new code paths, no API surface changes. The only finding is a P2 noting that preset entries in test_configs.yaml still hardcode 256, which may be intentional for reproducibility but is worth a look.

nemo_retriever/harness/test_configs.yaml — 8 preset entries still explicitly set embed_batch_size: 256, which may be inconsistent with the intent of this PR.

Important Files Changed

Filename Overview
nemo_retriever/src/nemo_retriever/harness/config.py Changes embed_batch_size default from 256 to 32 in HarnessConfig; straightforward one-line change aligned with PR intent.
nemo_retriever/src/nemo_retriever/params/models.py Changes embed_batch_size default from 256 to 32 in BatchTuningParams; straightforward one-line change consistent with the harness config change.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User / Harness Run] --> B{embed_batch_size source?}
    B -->|Named preset in test_configs.yaml| C["Explicit preset value\n(still 256 in 8 presets)"]
    B -->|No preset override| D["HarnessConfig default\n256 → 32 ✅"]
    B -->|BatchTuningParams| E["BatchTuningParams default\n256 → 32 ✅"]
    C --> F[Embed Workers process batch at 256]
    D --> G[Embed Workers process batch at 32]
    E --> G
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/harness/config.py
Line: 109

Comment:
**Preset configs in `test_configs.yaml` still set `embed_batch_size: 256`**

`nemo_retriever/harness/test_configs.yaml` contains 8 preset entries (e.g. `PE_GE_OCR_TE_DENSE`, `PE_GE_OCR_TE_HYBRID`, …) each explicitly setting `embed_batch_size: 256`. These explicit preset values override the default and would still exercise the old batch size for anyone running harness benchmarks against these named presets. If the goal is to move the whole product surface away from 256, those preset entries likely need to be updated (or the explicit overrides removed so the new default takes effect).

How can I resolve this? If you propose a fix, please make it concise.

Reviews (2): Last reviewed commit: "Merge branch 'main' into codex/default-e..." | Re-trigger Greptile

Copy link
Copy Markdown
Collaborator

@jdye64 jdye64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, 1000% needed

@jdye64 jdye64 merged commit 3a2894c into NVIDIA:main Apr 22, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants