Skip to content

Dynamo Server Fixes + Nemotron Parsing PDF Benchmark changes#2050

Open
praateekmahajan wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
praateekmahajan:praateek/pdf-parsing-dynamo
Open

Dynamo Server Fixes + Nemotron Parsing PDF Benchmark changes#2050
praateekmahajan wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
praateekmahajan:praateek/pdf-parsing-dynamo

Conversation

@praateekmahajan

Copy link
Copy Markdown
Contributor

Description

Usage

# Add snippet demonstrating usage

Checklist

  • I am familiar with the Contributing Guide.
  • New or Existing tests cover these changes.
  • The documentation is up to date with these changes.

@praateekmahajan praateekmahajan requested a review from a team as a code owner June 5, 2026 02:13
@praateekmahajan praateekmahajan requested review from ayushdg and removed request for a team June 5, 2026 02:13
@copy-pr-bot

copy-pr-bot Bot commented Jun 5, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@praateekmahajan praateekmahajan changed the title Support Dynamo PDF parsing server mode Dynamo Server Fixes + Nemotron Parsing PDF Benchmark changes Jun 5, 2026
@greptile-apps

greptile-apps Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR extends Dynamo PDF parsing to support a "server mode" by wiring three related changes: a subprocess_env passthrough on DynamoServerConfig, --no-<flag> emission for boolean-False engine kwargs, and conditional omission of --kv-events-config when the hybrid KV cache manager is explicitly enabled. It also enriches per-task metrics in the Nemotron-Parse inference stage and updates the benchmark to aggregate token/page throughput from those task stats.

  • Dynamo backend: adds subprocess_env: dict[str, str] to DynamoServerConfig and merges it into base_env before worker launch, enabling callers to inject arbitrary env vars (e.g. DYN_TCP_REQUEST_TIMEOUT) across all Dynamo subprocesses.
  • CLI flag generation: engine_kwargs_to_cli_flags now emits --no-<flag> for boolean False values (previously silently dropped), enabling explicit opt-out for vLLM BooleanOptionalAction arguments like --no-disable-hybrid-kv-cache-manager.
  • KV events / HMA compatibility: _launch_vllm_worker skips --kv-events-config when disable_hybrid_kv_cache_manager: False is explicitly set, avoiding a vLLM incompatibility.

Confidence Score: 4/5

Safe to merge; changes are well-tested and localized to metrics collection, CLI flag generation, and a new env-passthrough knob.

The core logic changes are each covered by new unit tests. The main quality concerns are that engine_kwargs_to_cli_flags emits --no-<flag> for any boolean False kwarg unconditionally, and the benchmark metric prefix is hardcoded to the stage default name, making both brittle in edge cases.

nemo_curator/core/serve/dynamo/infra.py (unconditional --no-<flag> behavior) and benchmarking/scripts/nemotron_parse_pdf_benchmark.py (hardcoded metric prefix).

Important Files Changed

Filename Overview
nemo_curator/core/serve/dynamo/infra.py Adds --no-<flag> emission for boolean False engine kwargs; applies unconditionally to all booleans rather than restricting to known BooleanOptionalAction flags.
nemo_curator/core/serve/dynamo/backend.py Merges subprocess_env into base_env; reserved keys ETCD_ENDPOINTS and NATS_SERVER are silently overridden if present in user-supplied env.
nemo_curator/core/serve/dynamo/vllm.py Skips --kv-events-config when disable_hybrid_kv_cache_manager: False is set explicitly; adds explicit_hybrid_kv_cache_manager_enabled helper with correct is False identity check.
nemo_curator/stages/interleaved/pdf/nemotron_parse/inference.py Adds per-task vLLM metrics (tokens, pages, retries, timing) via _vllm_metrics_from_outputs; refactors _infer_vllm to return raw outputs and retry count alongside decoded texts.
benchmarking/scripts/nemotron_parse_pdf_benchmark.py Replaces Parquet-based metrics with token/page throughput aggregated from task metrics; hardcodes a metric prefix that couples tightly to the stage default name field.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant DynamoBackend
    participant vllm_worker as _launch_vllm_worker
    participant dynamo_vllm as dynamo.vllm subprocess

    Caller->>DynamoBackend: "serve(backend_cfg) subprocess_env={DYN_TCP_REQUEST_TIMEOUT:180}"
    DynamoBackend->>DynamoBackend: merge subprocess_env + ETCD_ENDPOINTS + NATS_SERVER
    DynamoBackend->>vllm_worker: launch(model_config, base_env)
    vllm_worker->>vllm_worker: explicit_hybrid_kv_cache_manager_enabled?
    alt disable_hybrid_kv_cache_manager is False
        vllm_worker-->>dynamo_vllm: skip kv-events-config, emit no-disable-hybrid-kv-cache-manager
    else normal path
        vllm_worker-->>dynamo_vllm: kv-events-config enabled or disabled
    end
    dynamo_vllm-->>DynamoBackend: worker ready
Loading

Comments Outside Diff (1)

  1. nemo_curator/core/serve/dynamo/infra.py, line 107-127 (link)

    P2 --no-<flag> emitted for all False booleans, not only BooleanOptionalAction args

    The docstring says this only applies to vLLM BooleanOptionalAction arguments, but the implementation applies the --no-<flag> pattern to every boolean False value unconditionally. If a caller passes a kwarg like {"disable_log_stats": False} where the underlying vLLM flag only accepts the positive form (no --no-disable-log-stats variant), the subprocess will receive an unrecognized argument and fail to start. This is a correctness contract that the implementation currently cannot enforce — callers must remember which kwargs are safe to set to False.

Reviews (1): Last reviewed commit: "Support Dynamo PDF parsing server mode" | Re-trigger Greptile

Comment on lines +211 to +215
base_env = {
**backend_cfg.subprocess_env,
"ETCD_ENDPOINTS": etcd_endpoint,
"NATS_SERVER": nats_url,
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Reserved keys in subprocess_env are silently overridden

ETCD_ENDPOINTS and NATS_SERVER are placed after the spread of backend_cfg.subprocess_env, so any user-supplied values for those two keys are silently discarded. Since subprocess_env is a pass-through for custom env vars (e.g., DYN_TCP_REQUEST_TIMEOUT), a user who inadvertently includes one of these reserved keys will observe it having no effect and may spend time debugging. A guard that raises (or at minimum warns) when a reserved key is present in subprocess_env would make the failure mode explicit.

Comment on lines +51 to +54
metric_prefix = "task_nemotron_parse_inference_custom"

num_valid_pages = task_metrics.get(f"{metric_prefix}.num_valid_pages_sum", 0.0)
total_output_tokens = task_metrics.get(f"{metric_prefix}.total_output_tokens_sum", 0.0)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hardcoded metric prefix assumes the stage's name field is unchanged

metric_prefix = "task_nemotron_parse_inference_custom" encodes the stage name ("nemotron_parse_inference") directly. If a user ever instantiates NemotronParseInferenceStage with a custom name=, or if the class's default is later renamed, task_metrics.get(...) will return 0.0 for every key and the benchmark will silently report zero throughput without any error. Deriving the prefix from the stage object (or a shared constant) would be more robust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant