Set default embed batch size to 32 by jioffe502 · Pull Request #1897 · NVIDIA/NeMo-Retriever

jioffe502 · 2026-04-21T20:23:59Z

Summary

This replaces the larger embed batching/decoupling direction with a minimal coupled-default change.

Changes:

set BatchTuningParams.embed_batch_size default from 256 to 32
set harness embed_batch_size default from 256 to 32

This keeps the product surface coupled while moving the default away from the larger embed batch regime.

Why

Recent bo767 simple-run benchmarking did not support keeping the more complex decoupling work as a throughput win.

Key learnings:

64/64 was the clearly bad regime and produced embed OOMs
64/32 did not outperform 32/32 in the simple runner
on the simplest apples-to-apples 32/32 baseline, PR 1823 was near parity with upstream/main, not a clear speedup

Given that, the smallest defensible change is to keep coupling and lower the default.

Benchmark Notes

bo767 simple graph_pipeline <dataset> --embed-batch-size 32, 3 runs each:

upstream_main_32_32 median ingest PPS: 126.775
pr1823_32_32 median ingest PPS: 125.584

This supported a simpler default-tuning PR instead of retaining the extra decoupling control surface.

Validation

Focused harness tests on this branch:

test_harness_config.py
test_harness_run.py

Result:

the branch-specific default change itself behaved as expected
two test_harness_run.py failures are already present on clean upstream/main and are unrelated to this change

copy-pr-bot · 2026-04-21T20:24:02Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-04-21T20:25:08Z

Greptile Summary

This PR lowers the default embed_batch_size from 256 to 32 in both HarnessConfig and BatchTuningParams, backed by benchmark data showing 32/32 is near parity with 64/32 while avoiding embed OOMs at higher batch sizes.

Confidence Score: 5/5

Safe to merge — minimal two-line default change with no logic or API surface impact.

Both changes are single-line default value updates in well-understood config structs. No logic, no new code paths, no API surface changes. The only finding is a P2 noting that preset entries in test_configs.yaml still hardcode 256, which may be intentional for reproducibility but is worth a look.

nemo_retriever/harness/test_configs.yaml — 8 preset entries still explicitly set embed_batch_size: 256, which may be inconsistent with the intent of this PR.

Important Files Changed

Filename	Overview
nemo_retriever/src/nemo_retriever/harness/config.py	Changes `embed_batch_size` default from 256 to 32 in `HarnessConfig`; straightforward one-line change aligned with PR intent.
nemo_retriever/src/nemo_retriever/params/models.py	Changes `embed_batch_size` default from 256 to 32 in `BatchTuningParams`; straightforward one-line change consistent with the harness config change.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User / Harness Run] --> B{embed_batch_size source?}
    B -->|Named preset in test_configs.yaml| C["Explicit preset value\n(still 256 in 8 presets)"]
    B -->|No preset override| D["HarnessConfig default\n256 → 32 ✅"]
    B -->|BatchTuningParams| E["BatchTuningParams default\n256 → 32 ✅"]
    C --> F[Embed Workers process batch at 256]
    D --> G[Embed Workers process batch at 32]
    E --> G

Prompt To Fix All With AI

This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/harness/config.py
Line: 109

Comment:
**Preset configs in `test_configs.yaml` still set `embed_batch_size: 256`**

`nemo_retriever/harness/test_configs.yaml` contains 8 preset entries (e.g. `PE_GE_OCR_TE_DENSE`, `PE_GE_OCR_TE_HYBRID`, …) each explicitly setting `embed_batch_size: 256`. These explicit preset values override the default and would still exercise the old batch size for anyone running harness benchmarks against these named presets. If the goal is to move the whole product surface away from 256, those preset entries likely need to be updated (or the explicit overrides removed so the new default takes effect).

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (2): Last reviewed commit: "Merge branch 'main' into codex/default-e..." | Re-trigger Greptile}

jdye64

yep, 1000% needed

Set default embed batch size to 32

4d37cec

jioffe502 requested review from a team as code owners April 21, 2026 20:24

jioffe502 requested a review from jperez999 April 21, 2026 20:24

jperez999 approved these changes Apr 21, 2026

View reviewed changes

Merge branch 'main' into codex/default-embed-batch-32

63bab3e

jioffe502 mentioned this pull request Apr 22, 2026

harness: stabilize embed OOM tuning and defaults #1823

Closed

4 tasks

jdye64 approved these changes Apr 22, 2026

View reviewed changes

jdye64 merged commit 3a2894c into NVIDIA:main Apr 22, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set default embed batch size to 32#1897

Set default embed batch size to 32#1897
jdye64 merged 2 commits intoNVIDIA:mainfrom
jioffe502:codex/default-embed-batch-32

jioffe502 commented Apr 21, 2026

Uh oh!

copy-pr-bot Bot commented Apr 21, 2026

Uh oh!

greptile-apps Bot commented Apr 21, 2026 •

edited

Loading

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

jdye64 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jioffe502 commented Apr 21, 2026

Summary

Why

Benchmark Notes

Validation

Uh oh!

copy-pr-bot Bot commented Apr 21, 2026

Uh oh!

greptile-apps Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

jdye64 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps Bot commented Apr 21, 2026 •

edited

Loading