Skip to content

[Core] Add data parallelism override support#623

Closed
smfirmin wants to merge 1 commit into
ome-projects:mainfrom
smfirmin:add-data-parallelism-override
Closed

[Core] Add data parallelism override support#623
smfirmin wants to merge 1 commit into
ome-projects:mainfrom
smfirmin:add-data-parallelism-override

Conversation

@smfirmin
Copy link
Copy Markdown

What this PR does

Adds explicit data parallelism override support for accelerator-specific runtime configuration.

When tensorParallelismOverride.dataParallelSize is set for a selected accelerator class, OME now updates matching runtime arguments using the supported SGLang and vLLM aliases:

  • --dp-size
  • --dp
  • --data-parallel-size

This mirrors the existing tensor and pipeline parallelism override behavior.

Why we need it

The API already exposes dataParallelSize, but the reconciler only applied tensor parallel and pipeline parallel sizing. This meant accelerator-specific data parallel sizing could be configured but would not be reflected in generated runtime command or args.

Data parallel sizing is needed for MoE deployments where effective parallelism depends on TP x DP.

Fixes #

How to test

go test ./pkg/controller/v1beta1/inferenceservice/components
git diff --check

Checklist

  • Tests added/updated (if applicable)
  • Docs updated (if applicable)
  • make test passes locally

@github-actions github-actions Bot added inferenceservice InferenceService controller changes controller Controller changes tests Test changes labels May 26, 2026
@smfirmin smfirmin changed the title Add data parallelism override support [Core] Add data parallelism override support May 26, 2026
@YouNeedCryDear
Copy link
Copy Markdown
Collaborator

duplicate of #611

@smfirmin smfirmin closed this May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

controller Controller changes inferenceservice InferenceService controller changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants