Skip to content

[Runtime] Add NVIDIA-Nemotron-3-Super-120B-A12B-FP8 runtime#610

Open
TJ5 wants to merge 4 commits into
ome-projects:mainfrom
TJ5:nvidia-nemotron-super-fp8-runtime
Open

[Runtime] Add NVIDIA-Nemotron-3-Super-120B-A12B-FP8 runtime#610
TJ5 wants to merge 4 commits into
ome-projects:mainfrom
TJ5:nvidia-nemotron-super-fp8-runtime

Conversation

@TJ5
Copy link
Copy Markdown

@TJ5 TJ5 commented May 12, 2026

What this PR does

  • Adds OME configuration for serving NVIDIA Nemotron 3 Super 120B A12B FP8 with a 1M context window:

  • Adds the nvidia-nemotron-3-super-120b-a12b-fp8 ClusterBaseModel pointing to hf://nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-FP8.

  • Adds the vllm-nvidia-nemotron-3-super-120b-a12b-fp8 ClusterServingRuntime with SMG router + vLLM settings for NemotronHForCausalLM, FP8 KV cache, 4-way tensor parallelism, H100 scheduling, chunked prefill, Nemotron v3 reasoning parsing, and Qwen3 Coder tool-call parsing.

  • Registers the model and runtime in the kustomizations.

  • Adds a sample InferenceService for the NVIDIA Nemotron namespace.

Why we need it

Enables serving for NVIDIA-Nemotron-3-Super-120B-A12B-FP8.

Fixes #

How to test

Checklist

  • Tests added/updated (if applicable)
  • Docs updated (if applicable)
  • make test passes locally

@github-actions github-actions Bot added runtime Runtime configuration changes models Model configuration changes config Configuration changes labels May 12, 2026
@TJ5 TJ5 marked this pull request as ready for review May 12, 2026 20:15
Comment thread config/runtimes/vllm/nvidia/nvidia-nemotron-3-super-120b-a12b-fp8-rt.yaml Outdated
Comment thread config/runtimes/vllm/nvidia/nvidia-nemotron-3-super-120b-a12b-fp8-rt.yaml Outdated
Comment thread config/runtimes/vllm/nvidia/nvidia-nemotron-3-super-120b-a12b-fp8-rt.yaml Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

config Configuration changes models Model configuration changes runtime Runtime configuration changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants