Dynamo Server Fixes + Nemotron Parsing PDF Benchmark changes#2050
Dynamo Server Fixes + Nemotron Parsing PDF Benchmark changes#2050praateekmahajan wants to merge 1 commit into
Conversation
Greptile SummaryThis PR extends Dynamo PDF parsing to support a "server mode" by wiring three related changes: a
Confidence Score: 4/5Safe to merge; changes are well-tested and localized to metrics collection, CLI flag generation, and a new env-passthrough knob. The core logic changes are each covered by new unit tests. The main quality concerns are that
Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller
participant DynamoBackend
participant vllm_worker as _launch_vllm_worker
participant dynamo_vllm as dynamo.vllm subprocess
Caller->>DynamoBackend: "serve(backend_cfg) subprocess_env={DYN_TCP_REQUEST_TIMEOUT:180}"
DynamoBackend->>DynamoBackend: merge subprocess_env + ETCD_ENDPOINTS + NATS_SERVER
DynamoBackend->>vllm_worker: launch(model_config, base_env)
vllm_worker->>vllm_worker: explicit_hybrid_kv_cache_manager_enabled?
alt disable_hybrid_kv_cache_manager is False
vllm_worker-->>dynamo_vllm: skip kv-events-config, emit no-disable-hybrid-kv-cache-manager
else normal path
vllm_worker-->>dynamo_vllm: kv-events-config enabled or disabled
end
dynamo_vllm-->>DynamoBackend: worker ready
|
| base_env = { | ||
| **backend_cfg.subprocess_env, | ||
| "ETCD_ENDPOINTS": etcd_endpoint, | ||
| "NATS_SERVER": nats_url, | ||
| } |
There was a problem hiding this comment.
Reserved keys in
subprocess_env are silently overridden
ETCD_ENDPOINTS and NATS_SERVER are placed after the spread of backend_cfg.subprocess_env, so any user-supplied values for those two keys are silently discarded. Since subprocess_env is a pass-through for custom env vars (e.g., DYN_TCP_REQUEST_TIMEOUT), a user who inadvertently includes one of these reserved keys will observe it having no effect and may spend time debugging. A guard that raises (or at minimum warns) when a reserved key is present in subprocess_env would make the failure mode explicit.
| metric_prefix = "task_nemotron_parse_inference_custom" | ||
|
|
||
| num_valid_pages = task_metrics.get(f"{metric_prefix}.num_valid_pages_sum", 0.0) | ||
| total_output_tokens = task_metrics.get(f"{metric_prefix}.total_output_tokens_sum", 0.0) |
There was a problem hiding this comment.
Hardcoded metric prefix assumes the stage's
name field is unchanged
metric_prefix = "task_nemotron_parse_inference_custom" encodes the stage name ("nemotron_parse_inference") directly. If a user ever instantiates NemotronParseInferenceStage with a custom name=, or if the class's default is later renamed, task_metrics.get(...) will return 0.0 for every key and the benchmark will silently report zero throughput without any error. Deriving the prefix from the stage object (or a shared constant) would be more robust.
Description
Usage
# Add snippet demonstrating usageChecklist