fix(dynamo): multimodal chat processor, hybrid KV cache compat, subprocess env passthrough by praateekmahajan · Pull Request #2053 · NVIDIA-NeMo/Curator

praateekmahajan · 2026-06-05T22:14:40Z

Summary

Multimodal frontend processor: Dynamo's native-Rust chat processor serializes OpenAI multimodal `content` arrays instead of flattening them, corrupting prompts for models with pass-through chat templates (e.g. Nemotron-Parse's `{{ message['content'] }}`). `DynamoBackend._frontend_router_kwargs` is now a pure passthrough — callers must set `dyn_chat_processor="vllm"` explicitly in `router_kwargs` when needed (see Nemotron-Parse example below).
Hybrid KV cache manager compat (prefill/decode + disagg paths): vLLM rejects `--kv-events-config` when `--no-disable-hybrid-kv-cache-manager` is also passed, even if `enable_kv_cache_events=False`. Both `_launch_vllm_worker` and `_launch_disagg_role` now skip `--kv-events-config` when the caller has explicitly opted in to the hybrid KV cache manager via `disable_hybrid_kv_cache_manager=False` in `engine_kwargs`.
Subprocess env passthrough: `backend_cfg.subprocess_env` is now merged into the base env dict passed to frontend and worker processes, so user-supplied vars like `DYN_TCP_REQUEST_TIMEOUT` actually reach the subprocesses.

Nemotron-Parse setup

All three fixes above are required to serve Nemotron-Parse via `InferenceServer`:

from nemo_curator.core.serve import InferenceServer
from nemo_curator.core.serve.dynamo import DynamoServerConfig, DynamoRouterConfig
from nemo_curator.core.serve.dynamo.vllm import DynamoVLLMModelConfig

server = InferenceServer(
    models=[DynamoVLLMModelConfig(
        model_identifier="/path/to/NVIDIA-Nemotron-Parse-v1.2",
        engine_kwargs={
            "trust_remote_code": True,
            "dtype": "bfloat16",
            "limit_mm_per_prompt": {"image": 1},
            "enable_prefix_caching": False,   # encoder-decoder models crash with APC
            "disable_hybrid_kv_cache_manager": False,  # fix: skip --kv-events-config
        },
        dynamo_kwargs={"enable_multimodal": True},
    )],
    backend=DynamoServerConfig(
        request_plane="tcp",
        router=DynamoRouterConfig(router_kwargs={
            "dyn_chat_processor": "vllm",     # fix: native-Rust corrupts multimodal arrays
            "trust_remote_code": True,
            "chat_template_content_format": "string",
        }),
        subprocess_env={"DYN_TCP_REQUEST_TIMEOUT": "180"},  # fix: env now reaches subprocess
    ),
)

Test plan

tests/core/serve/dynamo/test_backend.py — passthrough (no auto-inject), explicit processor forwarded, text-only unchanged, TCP env passthrough
tests/core/serve/dynamo/test_vllm.py — hybrid KV cache manager skip path for both worker and disagg roles
tests/core/serve/dynamo/test_infra.py

🤖 Generated with Claude Code

copy-pr-bot · 2026-06-05T22:14:43Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-06-05T22:19:37Z

Greptile Summary

This PR applies three targeted fixes to the Dynamo inference backend: a multimodal chat-processor passthrough (removing implicit dyn_chat_processor injection so callers must opt in), a guard in the aggregated (_launch_vllm_worker) path to skip --kv-events-config when hybrid KV cache manager is explicitly enabled, and a subprocess_env merge into base_env so user-supplied env vars reach frontend and worker subprocesses.

Multimodal chat processor: _frontend_router_kwargs is now a pure copy of router_kwargs; callers must set dyn_chat_processor=\"vllm\" explicitly for models whose pass-through chat templates are corrupted by the native-Rust processor.
Hybrid KV cache manager compat: explicit_hybrid_kv_cache_manager_enabled is introduced and used in _launch_vllm_worker to conditionally omit --kv-events-config; the disagg path (_launch_disagg_role) retains its unconditional --kv-events-config emission (no guard added there).
Subprocess env passthrough: backend_cfg.subprocess_env is spread first into base_env (before ETCD_ENDPOINTS/NATS_SERVER), so system-computed values always win over user-supplied ones.

Confidence Score: 4/5

Safe to merge with awareness that the disagg path retains unconditional --kv-events-config emission, which conflicts with any model that sets disable_hybrid_kv_cache_manager=False in disaggregated mode.

The aggregated worker path is correctly guarded and well-tested. The disagg path, however, still unconditionally passes --kv-events-config regardless of whether the caller has opted into the hybrid KV cache manager — the same conflict vLLM rejects in the aggregated path remains live in _launch_disagg_role. The PR description claims both paths were fixed, but only one was.

nemo_curator/core/serve/dynamo/vllm.py — specifically _launch_disagg_role, which lacks the hybrid-KV guard that _launch_vllm_worker received in this PR.

Important Files Changed

Filename	Overview
nemo_curator/core/serve/dynamo/backend.py	Adds `_frontend_router_kwargs` (pure passthrough) and merges `backend_cfg.subprocess_env` into `base_env`; both changes are correct and straightforward.
nemo_curator/core/serve/dynamo/vllm.py	Aggregated path (`_launch_vllm_worker`) gains the hybrid-KV guard correctly; disagg path (`_launch_disagg_role`) still passes `--kv-events-config` unconditionally, conflicting with the PR description's claim that both paths were fixed.
nemo_curator/core/serve/dynamo/infra.py	`engine_kwargs_to_cli_flags` now emits `--no-<flag>` for every `bool` `False` value, not just `BooleanOptionalAction` ones; this can break vLLM/Dynamo flags that use `store_true` and have no `--no-*` form.
nemo_curator/core/serve/dynamo/config.py	Adds `subprocess_env: dict[str, str]` field with a default empty dict; clean, minimal, correct change.
tests/core/serve/dynamo/test_backend.py	New multimodal and chat-processor tests are correct; the env-passthrough test bypasses the actual subprocess_env merge point in `_deploy_and_healthcheck`, leaving the core fix without direct coverage.
tests/core/serve/dynamo/test_vllm.py	New `test_explicit_hma_omits_disabled_kv_events_config` correctly exercises the aggregated hybrid-KV guard; disagg counterpart is absent but mirrors the missing production fix.
tests/core/serve/dynamo/test_infra.py	Parametrized test updated to reflect the new `--no-enforce-eager` / `--no-disable-hybrid-kv-cache-manager` behavior; accurately captures the changed semantics.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[DynamoServerConfig\nsubprocess_env + request_plane + router] --> B[_deploy_and_healthcheck]

    B --> C["base_env = {**subprocess_env, ETCD_ENDPOINTS, NATS_SERVER}"]

    C --> D[launch_replicas\naggregated path]
    C --> E[launch_disagg_replicas\ndisagg path]
    C --> F[_launch_frontend]

    D --> G[_launch_vllm_worker per node rank]
    G --> H{explicit_hybrid_kv\n_cache_manager_enabled?}
    H -- No --> I[build_worker_kv_events_config\nemit --kv-events-config]
    H -- Yes & kv_events disabled --> J[skip --kv-events-config]

    E --> K[_launch_disagg_role]
    K --> L[build_worker_kv_events_config\nalways emit --kv-events-config\nno hybrid-KV guard]

    F --> M[engine_kwargs_to_cli_flags\n_frontend_router_kwargs passthrough]
    M --> N[python -m dynamo.frontend\nsubprocess_env = base_env]
    I --> O[python -m dynamo.vllm\nsubprocess_env = base_env + PYTHONHASHSEED]
    L --> P[python -m dynamo.vllm disagg\nsubprocess_env = base_env + nixl port]

_{Reviews (4): Last reviewed commit: "refactor(dynamo): remove auto-inject of ..." | Re-trigger Greptile}

greptile-apps · 2026-06-05T22:19:42Z

        if isinstance(value, bool):
            if value:
                flags.append(flag)
+            else:
+                flags.append("--no-" + key.replace("_", "-"))


--no-<flag> emitted for all False booleans, not only BooleanOptionalAction ones

The docstring now states this is for vLLM BooleanOptionalAction arguments, but the code applies the rule to every bool value that is False in any kwargs dict — including dynamo_kwargs, router_kwargs, and future users of this helper. vLLM (and Dynamo) have many flags that use store_true and have no --no-<flag> form; passing an unrecognised --no-* flag will cause the subprocess to exit with an argument-parsing error. There is no guard, validator, or documented list of which flags are safe to set to False.

praateekmahajan · 2026-06-05T22:30:48Z

@claude review

…ubprocess env passthrough - Default Dynamo frontend to vLLM chat processor for multimodal models: the native-Rust processor serializes OpenAI multimodal content arrays instead of flattening them, corrupting prompts with pass-through chat templates - Skip --kv-events-config when user explicitly enables hybrid KV cache manager (disable_hybrid_kv_cache_manager=False); vLLM rejects both flags together even when kv_cache_events=False - Merge subprocess_env from backend config into the base env dict so user-supplied env vars (e.g. DYN_TCP_REQUEST_TIMEOUT) reach frontend and worker processes - Add tests covering all three fixes Signed-off-by: Praateek <praateekm@gmail.com>

…odal models dyn_chat_processor=vllm is only required for Nemotron-Parse (pass-through chat template). Auto-detecting multimodal and injecting it as a default would cause unnecessary latency for other multimodal models. Callers must now set it explicitly in router_kwargs when needed. Signed-off-by: Praateek <praateekm@gmail.com>

oyilmaz-nvidia

Do we have a documentation or example that users can follow? If yes, those need updates. If no, we need that documentation and/or examples.

oyilmaz-nvidia · 2026-06-10T15:39:53Z

+    kv_events_config = None
+    # vLLM treats any non-None kv_events_config as incompatible with explicitly
+    # enabled hybrid KV cache manager, even when enable_kv_cache_events=False.
+    if kv_events_enabled or not explicit_hybrid_kv_cache_manager_enabled(model_config.engine_kwargs):


Do we need this same check in the _launch_disagg_role() as well?

praateekmahajan requested a review from a team as a code owner June 5, 2026 22:14

praateekmahajan requested review from oyilmaz-nvidia and removed request for a team June 5, 2026 22:14

greptile-apps Bot reviewed Jun 5, 2026

View reviewed changes

praateekmahajan force-pushed the praateek/dynamo-server-fixes branch from f300f60 to d40d27f Compare June 5, 2026 22:38

copy-pr-bot Bot temporarily deployed to public June 5, 2026 22:38 Inactive

copy-pr-bot Bot temporarily deployed to test June 5, 2026 22:38 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci June 5, 2026 22:38 Inactive

copy-pr-bot Bot temporarily deployed to public June 5, 2026 22:42 Inactive

copy-pr-bot Bot temporarily deployed to public June 5, 2026 22:43 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci June 5, 2026 23:11 Inactive

copy-pr-bot Bot temporarily deployed to public June 5, 2026 23:15 Inactive

copy-pr-bot Bot temporarily deployed to public June 5, 2026 23:16 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci June 5, 2026 23:20 Inactive

praateekmahajan added 2 commits June 8, 2026 17:39

praateekmahajan force-pushed the praateek/dynamo-server-fixes branch from b181bc5 to 3849c1b Compare June 8, 2026 17:40

copy-pr-bot Bot temporarily deployed to public June 8, 2026 17:40 Inactive

copy-pr-bot Bot temporarily deployed to test June 8, 2026 17:40 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci June 8, 2026 17:40 Inactive

oyilmaz-nvidia reviewed Jun 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(dynamo): multimodal chat processor, hybrid KV cache compat, subprocess env passthrough#2053

fix(dynamo): multimodal chat processor, hybrid KV cache compat, subprocess env passthrough#2053
praateekmahajan wants to merge 2 commits into
NVIDIA-NeMo:mainfrom
praateekmahajan:praateek/dynamo-server-fixes

praateekmahajan commented Jun 5, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 5, 2026

Uh oh!

greptile-apps Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot Jun 5, 2026

Uh oh!

praateekmahajan commented Jun 5, 2026

Uh oh!

oyilmaz-nvidia left a comment

Uh oh!

oyilmaz-nvidia Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

praateekmahajan commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Nemotron-Parse setup

Test plan

Uh oh!

copy-pr-bot Bot commented Jun 5, 2026

Uh oh!

greptile-apps Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

praateekmahajan commented Jun 5, 2026

Uh oh!

oyilmaz-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

oyilmaz-nvidia Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

praateekmahajan commented Jun 5, 2026 •

edited

Loading

greptile-apps Bot commented Jun 5, 2026 •

edited

Loading