Expand SGLang downstream coverage with MI35X model E2E tests by bingxche · Pull Request #2884 · ROCm/aiter

bingxche · 2026-04-23T15:45:54Z

Summary

replace the current SGLang downstream 1-GPU MI300X smoke coverage with MI35X 8-GPU model end-to-end coverage
reuse a single downstream setup flow, then run the MI35X DeepSeek-R1-MXFP4, Qwen3-235B-MXFP4, and DeepSeek-V3.2 test steps in sequence
align the downstream runner, GPU settings, dependency install, and per-step timeouts with the MI35X nightly model test paths

Test plan

Run Sglang Model E2E Test (8 GPU) in GitHub Actions
Verify DeepSeek-R1-MXFP4 accuracy and perf steps complete on linux-aiter-mi35x-8
Verify Qwen3-235B-MXFP4 combined suite completes on linux-aiter-mi35x-8
Verify DeepSeek-V3.2 accuracy and perf steps complete on linux-aiter-mi35x-8

Made with Cursor

github-actions · 2026-04-23T15:46:43Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:sglang`	SGLang integration tests
`ci:atom`	ATOM benchmark (DeepSeek-R1 + GPT-OSS)
`ci:vllm`	vLLM benchmark
`ci:all`	All of the above

Add labels via the sidebar or gh pr edit 2884 --add-label <label>

Drop the downstream-specific image override so the MI35X model E2E job follows SGLang's own container selection logic. Made-with: Cursor

gyohuangxin · 2026-04-24T05:09:04Z

Related to #2751

Copilot

Pull request overview

This PR updates the SGLang downstream GitHub Actions workflow to shift from MI300X 1-GPU smoke coverage to MI35X 8-GPU end-to-end model test coverage, running multiple MI35X model suites sequentially after a single downstream setup.

Changes:

Switch downstream runner target from MI300X (1 GPU) to MI35X (8 GPU) and update related GPU/hostname settings.
Remove dynamic SGLang base-image resolution and run MI35X nightly-style accuracy/performance suites in sequence (with a final aggregation step to fail the job if any suite fails).
Add extra Python dependencies needed by the MI35X E2E suites.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-24T05:14:55Z

+      SGL_BRANCH: main
+      GPU_ARCH: gfx950
+      SGLANG_CI_HOSTNAME_OVERRIDE: linux-mi35x-gpu-8


SGL_BRANCH is set to main, and the container start no longer pins a specific SGLang base image. This makes the downstream CI non-reproducible and can introduce unrelated breakages when SGLang (or its base image) changes. Consider pinning to a specific tag/commit (and/or explicitly selecting an image) or documenting why tracking main is required for these E2E suites.

Copilot · 2026-04-24T05:14:56Z

+      - name: Accuracy Test MI35x (8-GPU Qwen 3.5)
+        id: qwen35_accuracy
+        timeout-minutes: 70
+        continue-on-error: true
+        run: |
+          set -ex
+          cd "${SGLANG_WORKSPACE}"
+          > github_summary.md
+          bash scripts/ci/amd/amd_ci_exec.sh -w "/sglang-checkout/test" \
+            -e GITHUB_STEP_SUMMARY="/sglang-checkout/github_summary.md" \
+            python3 run_suite.py --hw amd --suite nightly-amd-accuracy-8-gpu-mi35x-qwen35 --nightly --timeout-per-file 3600 || TEST_EXIT_CODE=$?
+          echo "$(<github_summary.md )" >> "$GITHUB_STEP_SUMMARY" || true
+          exit ${TEST_EXIT_CODE:-0}
+
+      - name: Performance Test MI35x (8-GPU Qwen 3.5 FP8)
+        id: qwen35_perf
+        timeout-minutes: 100
+        continue-on-error: true
+        run: |
+          set -ex
+          cd "${SGLANG_WORKSPACE}"
+          > github_summary.md
+          bash scripts/ci/amd/amd_ci_exec.sh -w "/sglang-checkout/test" \
+            -e SGLANG_USE_AITER=1 \
+            -e GITHUB_STEP_SUMMARY="/sglang-checkout/github_summary.md" \
+            python3 run_suite.py --hw amd --suite nightly-perf-8-gpu-mi35x-qwen35-fp8 --nightly --timeout-per-file 5400 || TEST_EXIT_CODE=$?
+          echo "$(<github_summary.md )" >> "$GITHUB_STEP_SUMMARY" || true
+          exit ${TEST_EXIT_CODE:-0}


PR description lists MI35X DeepSeek-R1-MXFP4, Qwen3-235B-MXFP4, and DeepSeek-V3.2 E2E coverage, but this workflow also adds a separate Qwen 3.5 accuracy + perf sequence. Either update the PR description/test plan to include Qwen 3.5, or remove these extra steps if they’re not intended as part of this PR’s scope.

add sglang tests

20496e6

bingxche requested a review from a team April 23, 2026 15:45

bingxche marked this pull request as draft April 23, 2026 15:47

bingxche added 2 commits April 23, 2026 11:01

CI: use SGLang default image resolution

799539b

Drop the downstream-specific image override so the MI35X model E2E job follows SGLang's own container selection logic. Made-with: Cursor

add qwen35 tests

c838c3f

gyohuangxin self-assigned this Apr 24, 2026

gyohuangxin marked this pull request as ready for review April 24, 2026 05:10

Copilot AI review requested due to automatic review settings April 24, 2026 05:10

gyohuangxin added the ci:sglang label Apr 24, 2026

Copilot started reviewing on behalf of gyohuangxin April 24, 2026 05:11 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand SGLang downstream coverage with MI35X model E2E tests#2884

Expand SGLang downstream coverage with MI35X model E2E tests#2884
bingxche wants to merge 3 commits intomainfrom
bingxche/add-sglang-test

bingxche commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

gyohuangxin commented Apr 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bingxche commented Apr 23, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented Apr 23, 2026

🏷️ CI Guide

Uh oh!

gyohuangxin commented Apr 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants