Add Gemma 4 vLLM runtimes by ankrovv · Pull Request #599 · ome-projects/ome

ankrovv · 2026-05-05T19:04:58Z

Summary

Adds Gemma 4 vLLM ClusterServingRuntime manifests:

vllm-gemma-4-tp1: E2B, E4B, and 26B-A4B
vllm-gemma-4-tp2: 31B

Both runtimes match Gemma4ForConditionalGeneration and use per-accelerator tensor parallel overrides for A100-80G, H100, H200, and B200.

Config

Engine image: vLLM nightly v0.19.1.dev6+g6d4a8e6d2 with transformers 5.5.0.dev0
Router image: docker.io/lightseekorg/smg:1.4.1
Gemma 4 parser flags: --reasoning-parser=gemma4, --tool-call-parser=gemma4, --enable-auto-tool-choice
Long-context flags: --max-model-len=-1, --no-scheduler-reserve-full-isl
Multimodal caps:
- tp1: image=10,audio=1,video=1
- tp2: image=10,audio=0,video=1

Validation

Quality and feature validation passed for all four IT variants.
Context length verified up to 250K tokens on 26B-A4B and 31B.
Runtime smoke validation passed for the declared accelerator classes.
modelSizeRange keeps tp1 and tp2 auto-selection non-overlapping.

Test plan

kubectl apply --dry-run=server -f config/runtimes/vllm/gemma-4-tp1-rt.yaml
kubectl apply --dry-run=server -f config/runtimes/vllm/gemma-4-tp2-rt.yaml
kubectl kustomize config/runtimes

YouNeedCryDear

Could you also add the clusterbasemodel and a sample inference service? You can refer to what I did for the deepseek v4 #598 @ankrovv

XinyueZhang369 · 2026-05-19T19:29:20Z

+      version: "1.0.0"
+  modelSizeRange:
+    min: 4.6B
+    max: 27.7B


The change didn't follow old pattern to create one runtime for each model, why we switch to use one runtime for multiple models?

All the Gemma4 models use the same underlying model architecture so creating one runtime per model would create redundancies and consolidating by tp-size also allows for future expansion and support if google were to release new gemma4 models using the same architecture. cc @YouNeedCryDear for more context

Correct. We are not going direction of one runtime per model as it scaling. If multiple models in the same family are sharing the same architecture, same format and essentially same engine config. Then we are combining those into a single runtime. Parallelism and engine args overwrite will be controlled on Accelerator Class level. Please let me know if there is any concerns for it @XinyueZhang369

ankrovv requested review from CatherineSue, XinyueZhang369 and slin1237 as code owners May 5, 2026 19:04

github-actions Bot added runtime Runtime configuration changes config Configuration changes labels May 5, 2026

ankrovv changed the title ~~Add Gemma 4 vLLM runtimes (TP1 and TP2)~~ Add Gemma 4 vLLM runtimes May 5, 2026

YouNeedCryDear reviewed May 5, 2026

View reviewed changes

github-actions Bot added the models Model configuration changes label May 5, 2026

ankrovv force-pushed the feat/gemma4 branch from edaf710 to d34ec00 Compare May 5, 2026 19:55

YouNeedCryDear reviewed May 5, 2026

View reviewed changes

Comment thread config/runtimes/vllm/gemma-4-tp1-rt.yaml

Comment thread config/runtimes/vllm/gemma-4-tp1-rt.yaml Outdated

YouNeedCryDear approved these changes May 6, 2026

View reviewed changes

XinyueZhang369 reviewed May 19, 2026

View reviewed changes

Comment thread config/runtimes/kustomization.yaml Outdated

XinyueZhang369 reviewed May 19, 2026

View reviewed changes

ankrovv force-pushed the feat/gemma4 branch from 30c1546 to 7f2d6ee Compare May 19, 2026 19:39

Aniruddh Krovvidi added 3 commits May 19, 2026 13:16

add-gemma-4-runtimes

1a4d117

add-cbms-and-isvcs

c8a6a5e

router-startup-health

e15f548

ankrovv force-pushed the feat/gemma4 branch from cc42bce to e15f548 Compare May 19, 2026 20:17

Aniruddh Krovvidi added 2 commits June 3, 2026 22:10

Add Gemma 4 accelerator configs

f760b83

Set Gemma 4 TP2 priority

efeb065

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Gemma 4 vLLM runtimes#599

Add Gemma 4 vLLM runtimes#599
ankrovv wants to merge 5 commits into
ome-projects:mainfrom
ankrovv:feat/gemma4

ankrovv commented May 5, 2026 •

edited

Loading

Uh oh!

YouNeedCryDear left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

XinyueZhang369 May 19, 2026

Uh oh!

ankrovv May 19, 2026

Uh oh!

YouNeedCryDear May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ankrovv commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Config

Validation

Test plan

Uh oh!

YouNeedCryDear left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

XinyueZhang369 May 19, 2026

Choose a reason for hiding this comment

Uh oh!

ankrovv May 19, 2026

Choose a reason for hiding this comment

Uh oh!

YouNeedCryDear May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ankrovv commented May 5, 2026 •

edited

Loading

YouNeedCryDear left a comment •

edited

Loading