Skip to content

aiter test workflow enhance#2905

Draft
kiran-thumma wants to merge 29 commits intomainfrom
kithumma/aiter-test-workflow-enhance
Draft

aiter test workflow enhance#2905
kiran-thumma wants to merge 29 commits intomainfrom
kithumma/aiter-test-workflow-enhance

Conversation

@kiran-thumma
Copy link
Copy Markdown
Collaborator

Motivation

  • Add a hard wheel smoke gate so downstream GPU suites fail fast when the published AITER wheel is broken.
  • Normalize GPU labels and expose skip toggles so each suite only consumes the GPUs it needs.

Technical Details

  • .github/workflows/nightly.yaml: introduced skip_wheel_smoke, skip_sglang, skip_vllm, skip_atom, routed the smoke gate through test-whl.yaml, and short-circuited matrices when their suite is disabled while keeping the dependency chain intact.
  • .github/workflows/test-whl.yaml: added published-wheel fallback download, MI300X/MI35X × Python 3.10/3.12 coverage, and a no-op path when callers skip smoke.
  • .github/configs/vllm_models.json, .github/configs/vllm_tests.json, .github/scripts/run_vllm.sh, .github/scripts/run_vllm_test.sh, .github/configs/vllm_pins.json: normalized runner labels and enforced vLLM pin usage.
  • index.html: refreshed the dashboard to reflect the wheel gate, job counts, and new skip knobs.

Test Plan

  • Static workflow inspection.

Test Result

  • Not run (workflow-only change).

Submission Checklist

  - Add run_sglang, run_vllm, run_atom workflow_dispatch toggles
  - Create modular scripts: run_sglang.sh, run_vllm.sh, run_atom.sh
  - Wheels go to devreleases from non-schedule triggers
  - Promote only when all selected integration tests pass
  - Add run_sglang, run_vllm, run_atom workflow_dispatch toggles
  - Create modular scripts for Docker setup and test execution
  - Model configs in JSON files for easy maintenance
  - ATOM: all 15 accuracy models loaded from atom_models.json
  - vLLM: 7 latency benchmarks loaded from vllm_models.json
  - SGLang: dispatches full scout to sgl-project/sglang
  - Wheels go to devreleases from non-schedule triggers
  - Promote only when all selected integration tests pass
  - Add skip_build toggle to bypass build for faster testing
  - Add wheel_url input to use pre-built wheel directly
  - Add JSON model configs (atom_models.json, vllm_models.json)
  - Fix integration tests not depending on build when skipped
  - Add sglang_job_filter dropdown
  - SGLang: run on aiter-1gpu-runner instead of dispatching to external repo
  - ATOM: fix accuracy results path, log file path, workspace mount
  - Fix artifact names with illegal characters
  - Add cleanup traps to all scripts
  - Add atom_models.json and vllm_models.json configs
@github-actions
Copy link
Copy Markdown
Contributor

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:triton-300x Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
ci:sglang SGLang integration tests
ci:atom ATOM benchmark (DeepSeek-R1 + GPT-OSS)
ci:vllm vLLM benchmark
ci:all All of the above

Add labels via the sidebar or gh pr edit 2905 --add-label <label>

@gyohuangxin
Copy link
Copy Markdown
Member

@kiran-thumma Can we reuse current test workflows instead of adding so many new tests?
cc @valarLip

@kiran-thumma kiran-thumma reopened this Apr 24, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:triton-300x Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
ci:sglang SGLang integration tests
ci:atom ATOM benchmark (DeepSeek-R1 + GPT-OSS)
ci:vllm vLLM benchmark
ci:all All of the above

Add labels via the sidebar or gh pr edit 2905 --add-label <label>

@kiran-thumma
Copy link
Copy Markdown
Collaborator Author

@kiran-thumma Can we reuse current test workflows instead of adding so many new tests? cc @valarLip

I'm reusing the test-workflow.yml and nightly.yml workflows adding more tests and its not yet for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants