fix: expose inference server kwargs by nightcityblade · Pull Request #2028 · NVIDIA-NeMo/Curator

nightcityblade · 2026-05-25T03:14:42Z

Description

Expose a generic server_kwargs parameter on InferenceServer and forward it to Ray Serve's build_openai_app, allowing users to pass options such as ingress_deployment_config for tuning ingress replicas.

Closes #1621

Usage

from nemo_curator.core.serve import InferenceServer, RayServeModelConfig

server = InferenceServer(
    models=[RayServeModelConfig(model_identifier="my-model")],
    server_kwargs={
        "ingress_deployment_config": {
            "autoscaling_config": {"min_replicas": 4, "max_replicas": 4},
        },
    },
)

Checklist

I am familiar with the Contributing Guide.
New or Existing tests cover these changes.
The documentation is up to date with these changes.

Testing

uv run ruff check nemo_curator/core/serve/server.py nemo_curator/core/serve/ray_serve/backend.py tests/core/serve/ray_serve/test_backend.py
uv run python -m py_compile nemo_curator/core/serve/server.py nemo_curator/core/serve/ray_serve/backend.py tests/core/serve/ray_serve/test_backend.py
uv run pytest tests/core/serve/ray_serve/test_backend.py tests/core/serve/test_server.py (blocked locally: package intentionally raises on Darwin; NeMo-Curator currently only supports Linux)

Signed-off-by: nightcityblade <nightcityblade@gmail.com>

copy-pr-bot · 2026-05-25T03:14:45Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-05-25T03:17:14Z

Greptile Summary

This PR adds a server_kwargs pass-through on InferenceServer so callers can supply arbitrary keyword arguments (e.g. ingress_deployment_config) directly to Ray Serve's build_openai_app factory without needing internal changes for each option.

server.py: adds server_kwargs: dict[str, Any] = field(default_factory=dict) to InferenceServer.
backend.py: extracts _build_openai_app_args(), which deep-copies server_kwargs, guards against the reserved llm_configs key, and correctly merges the quiet runtime_env into any user-supplied ingress_deployment_config.
test_backend.py: two new tests verify the forwarding/merge behavior and the llm_configs guard.

Confidence Score: 5/5

Safe to merge; changes are additive and well-tested.

The change is small and additive — a new optional field with a default-empty dict and a clean extraction of build-args logic into a helper. The guard against the reserved llm_configs key prevents silent data loss, the deep-copy isolates caller state, and the runtime-env merge correctly handles user-supplied ingress_deployment_config. Two targeted tests cover both the happy path and the error path.

No files require special attention beyond the noted server_kwargs being Ray Serve-specific despite living on the generic InferenceServer dataclass.

Important Files Changed

Filename	Overview
nemo_curator/core/serve/server.py	Adds `server_kwargs: dict[str, Any]` field to `InferenceServer`; only consumed by `RayServeBackend`
nemo_curator/core/serve/ray_serve/backend.py	Extracts `_build_openai_app_args` to deep-copy `server_kwargs`, guard against reserved `llm_configs` key, and properly merge `quiet_runtime_env` into any user-supplied `ingress_deployment_config`
tests/core/serve/ray_serve/test_backend.py	Adds two tests: forwarding of `server_kwargs` into `build_openai_app` args (including runtime-env merge) and rejection of reserved `llm_configs` key

Sequence Diagram

sequenceDiagram
    participant User
    participant InferenceServer
    participant RayServeBackend
    participant build_openai_app

    User->>InferenceServer: "InferenceServer(models, server_kwargs={...})"
    User->>InferenceServer: start()
    InferenceServer->>RayServeBackend: _deploy()
    RayServeBackend->>RayServeBackend: _build_openai_app_args(llm_configs, quiet_env)
    Note over RayServeBackend: deepcopy(server_kwargs)<br/>guard llm_configs key<br/>merge quiet_runtime_env
    RayServeBackend->>build_openai_app: build_args (server_kwargs + llm_configs)
    build_openai_app-->>RayServeBackend: app
    RayServeBackend->>RayServeBackend: serve.run(app)

_{Reviews (3): Last reviewed commit: "Remove redundant Ray Serve deepcopy call..." | Re-trigger Greptile}

greptile-apps · 2026-05-25T03:17:17Z

+        build_args = deepcopy(self._server.server_kwargs)
+        build_args["llm_configs"] = llm_configs


llm_configs key silently overwritten from server_kwargs — if a caller includes "llm_configs" in server_kwargs (e.g. trying to pre-seed the list), line 105 clobbers it without any warning. Adding an explicit guard makes the contract clear and surfaces the mistake instead of silently dropping user data.

Suggested change

build_args = deepcopy(self._server.server_kwargs)

build_args["llm_configs"] = llm_configs

build_args = deepcopy(self._server.server_kwargs)

if "llm_configs" in build_args:

msg = "'llm_configs' is a reserved key and must not be set in server_kwargs; use InferenceServer.models instead."

raise ValueError(msg)

build_args["llm_configs"] = llm_configs

greptile-apps · 2026-05-25T03:17:18Z

+            ingress_config = deepcopy(build_args.get("ingress_deployment_config", {}))
+            actor_options = deepcopy(ingress_config.get("ray_actor_options", {}))


The deepcopy calls on lines 109–110 are redundant. build_args is already a deep copy of server_kwargs (line 104), so any nested dict extracted from it is already an independent object. The extra copies add allocation overhead without protecting anything.

Suggested change

ingress_config = deepcopy(build_args.get("ingress_deployment_config", {}))

actor_options = deepcopy(ingress_config.get("ray_actor_options", {}))

ingress_config = build_args.get("ingress_deployment_config", {})

actor_options = ingress_config.get("ray_actor_options", {})

Thanks, addressed in 0e954ce by reusing the nested dictionaries from the already-deep-copied build_args instead of deepcopying them again. Verification: python3 -m ruff check nemo_curator/core/serve/ray_serve/backend.py tests/core/serve/ray_serve/test_backend.py passed; python3 -m compileall -q nemo_curator/core/serve/ray_serve/backend.py tests/core/serve/ray_serve/test_backend.py passed. Targeted pytest is blocked locally because ray is not installed in this environment.

nightcityblade · 2026-05-25T15:14:20Z

Addressed the Greptile feedback in the latest push:

_build_openai_app_args now rejects server_kwargs containing llm_configs instead of silently overwriting it.
Added a regression test for that validation path.

Validation run locally:

uvx ruff check nemo_curator/core/serve/ray_serve/backend.py tests/core/serve/ray_serve/test_backend.py ✅
.venv/bin/python -m py_compile nemo_curator/core/serve/ray_serve/backend.py tests/core/serve/ray_serve/test_backend.py ✅
.venv/bin/python -m pytest tests/core/serve/ray_serve/test_backend.py could not complete on this macOS ARM host because NeMo-Curator raises its Linux-only platform guard during test collection; installing the full inference_server extra also hit NVIDIA/vLLM wheel availability limits on macOS ARM.

fix: expose inference server kwargs

3ac5e89

Signed-off-by: nightcityblade <nightcityblade@gmail.com>

nightcityblade requested a review from a team as a code owner May 25, 2026 03:14

nightcityblade requested review from ayushdg and removed request for a team May 25, 2026 03:14

github-actions Bot added the community-request label May 25, 2026

greptile-apps Bot reviewed May 25, 2026

View reviewed changes

fix: reject generated server kwargs

5236791

Remove redundant Ray Serve deepcopy calls

0e954ce

svcnvidia-nemo-ci added the waiting-on-maintainers Waiting on maintainers to respond label May 28, 2026

sarahyurick requested a review from praateekmahajan June 2, 2026 21:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: expose inference server kwargs#2028

fix: expose inference server kwargs#2028
nightcityblade wants to merge 3 commits into
NVIDIA-NeMo:mainfrom
nightcityblade:fix/issue-1621

nightcityblade commented May 25, 2026

Uh oh!

copy-pr-bot Bot commented May 25, 2026

Uh oh!

greptile-apps Bot commented May 25, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot May 25, 2026

Uh oh!

greptile-apps Bot May 25, 2026

Uh oh!

nightcityblade May 26, 2026

Uh oh!

nightcityblade commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		build_args = deepcopy(self._server.server_kwargs)
		build_args["llm_configs"] = llm_configs

		ingress_config = deepcopy(build_args.get("ingress_deployment_config", {}))
		actor_options = deepcopy(ingress_config.get("ray_actor_options", {}))

Conversation

nightcityblade commented May 25, 2026

Description

Usage

Checklist

Testing

Uh oh!

copy-pr-bot Bot commented May 25, 2026

Uh oh!

greptile-apps Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

nightcityblade May 26, 2026

Choose a reason for hiding this comment

Uh oh!

nightcityblade commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented May 25, 2026 •

edited

Loading