Skip to content

dsv4-b300-sglang: update points#1179

Open
yhyang201 wants to merge 14 commits intomainfrom
dsv4-b300-sglang-conc2048-mega-moe
Open

dsv4-b300-sglang: update points#1179
yhyang201 wants to merge 14 commits intomainfrom
dsv4-b300-sglang-conc2048-mega-moe

Conversation

@yhyang201
Copy link
Copy Markdown
Collaborator

@yhyang201 yhyang201 commented Apr 26, 2026

Summary

Test plan

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment on lines +80 to +94
export SGLANG_OPT_USE_DEEPGEMM_MEGA_MOE=1
export SGLANG_OPT_FIX_HASH_MEGA_MOE=1
export SGLANG_OPT_DEEPGEMM_MEGA_MOE_NUM_MAX_TOKENS_PER_RANK=288
PARALLEL_ARGS=(
--dp-size "$TP"
--enable-dp-attention
--moe-a2a-backend deepep
--cuda-graph-max-bs 288
--deepep-config "$DEEPEP_CONFIG"
--chunked-prefill-size 65536
--tokenizer-worker-num 4
--enable-prefill-delayer
)
MAX_RUNNING_REQUESTS=2560
MEM_FRACTION_STATIC=0.87
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Two pre-existing comments immediately above the DP_ATTENTION block became inaccurate after this PR added the CONC=2048 branch. The block comment at lines 63-66 still describes the recipe as "flashinfer_mxfp4 runner + halved prefill chunks + prefill-delayer", but the new CONC=2048 path uses --moe-a2a-backend deepep and --chunked-prefill-size 65536 (4x the non-DP value of 8192, not halved). Line 69 says the DP-attn branch "overrides to 0.94", but it now overrides to either 0.94 or 0.87 depending on CONC — worth refreshing the comments alongside this change so future maintainers don't trust stale assumptions.

Extended reasoning...

What the stale comments say

Lines 63-66 contain the rationale comment for the DP_ATTENTION dispatch block:

Pick the parallelism + MoE backend based on DP_ATTENTION (mirrors the vllm script's pattern). DP-attention runs the empirically-tuned high-concurrency recipe (flashinfer_mxfp4 runner + halved prefill chunks + prefill-delayer); single-instance uses flashinfer_mxfp4 with the cookbook defaults.

Line 69 contains:

# Default; the DP-attn branch below overrides to 0.94.

Both were accurate before this PR — the DP-attn branch was a single recipe that always used flashinfer_mxfp4, set --chunked-prefill-size 16384 (half the previous 32768 cookbook value, hence "halved"), and always set MEM_FRACTION_STATIC=0.94.

Why this PR makes them inaccurate

The new if [ "$CONC" = "2048" ]; then ... else ... split inside the DP-attn branch breaks both invariants:

  1. The CONC=2048 path uses --moe-a2a-backend deepep (not flashinfer_mxfp4), SGLANG_OPT_USE_DEEPGEMM_MEGA_MOE=1 (the mega_moe deepep recipe — described in the PR description and changelog as a different recipe family entirely), and --chunked-prefill-size 65536. The block comment now describes only half of the DP-attn cases.

  2. The wording "halved prefill chunks" is now actively misleading: 65536 is 8x the non-DP path's --chunked-prefill-size 8192, i.e. multiplied, not halved. A reader looking at line 65 next to lines 78-94 will see a direct contradiction.

  3. MEM_FRACTION_STATIC is now overridden to 0.94 (CONC<2048) or 0.87 (CONC=2048), so line 69's single-value claim is no longer correct.

Step-by-step proof

  • Before this PR: DP_ATTENTION=true → always --moe-runner-backend flashinfer_mxfp4, --chunked-prefill-size 16384, MEM_FRACTION_STATIC=0.94. Comments are correct.
  • After this PR with DP_ATTENTION=true CONC=2048: --moe-a2a-backend deepep (not flashinfer_mxfp4) ✗, --chunked-prefill-size 65536 (not halved relative to non-DP 8192 — it's 8x) ✗, MEM_FRACTION_STATIC=0.87 (not 0.94) ✗. All three claims fail.
  • After this PR with DP_ATTENTION=true CONC=1024: comments still happen to be correct, but a maintainer reading them as describing "the DP-attn recipe" will be wrong about the other branch.

Severity / impact

This is a documentation accuracy issue, not a behavioral bug — runtime behavior is unaffected. But the file's comments are explicitly there to give future maintainers the empirical rationale ("empirically-tuned", "cookbook defaults"), and silently letting them drift turns future debugging into a trap. Easiest fix is to update lines 63-66 to mention both recipes (flashinfer_mxfp4 + halved chunks for CONC<2048; mega_moe deepep + larger chunks for CONC=2048) and reword line 69 to say the DP-attn branch overrides to 0.94 or 0.87 depending on CONC.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@yhyang201
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24961231373
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: 6a02d2d
Approval: not required (trusted collaborator).

yhyang201 and others added 2 commits April 27, 2026 00:18
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… configs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@yhyang201
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24962186268
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: 0ea8e62
Approval: not required (trusted collaborator).

@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Apr 26, 2026

@yhyang201 Hi please hold off on sweeps until we get some CI unblocked

@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Apr 26, 2026

@Qiaolin-Yu Qiaolin-Yu self-assigned this Apr 26, 2026
…nc=2048

- YAML: conc=2048 and conc=4096 (both 1k1k and 8k1k) had tp=4, should be tp=8
- Script: conc=2048 was missing explicit SWA_FULL_TOKENS_RATIO=0.1, causing
  1k1k to incorrectly use 0.5 from the ISL-based default

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@yhyang201
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24978717689
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: e8685d9
Approval: not required (trusted collaborator).

Disable NVSHMEM IB transport in the two code paths that explicitly use
--moe-a2a-backend deepep (EP_SIZE=8 and CONC=2048/4096).
@yhyang201
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24991420778
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: 4575ce6
Approval: not required (trusted collaborator).

Pin dsv4-fp4-b300-sglang to lmsysorg/sglang:deepseek-v4-b300@sha256:2fec8d7958bb0d53b50d7bf04d6ae6a7de8a35503775826e0550a45dd8c3ee15.
@yhyang201
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24993602429
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: 0fb4d3c
Approval: not required (trusted collaborator).

@yhyang201
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24994940494
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: c0f9334
Approval: not required (trusted collaborator).

@yhyang201
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24997173342
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: 5352757
Approval: not required (trusted collaborator).

@yhyang201
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24997928458
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: a6e7ea0
Approval: not required (trusted collaborator).

@yhyang201
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24998946908
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: 758012f
Approval: not required (trusted collaborator).

@yhyang201
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24999947919
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: 8e2d2ff
Approval: not required (trusted collaborator).

@yhyang201
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25000588798
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: d6c8873
Approval: not required (trusted collaborator).

@yhyang201
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25002151282
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: f809d8f
Approval: not required (trusted collaborator).

@yhyang201
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25002547656
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: d34b32c
Approval: not required (trusted collaborator).

@yhyang201 yhyang201 force-pushed the dsv4-b300-sglang-conc2048-mega-moe branch from 85d3b27 to 7cc1c12 Compare April 27, 2026 16:37
@Qiaolin-Yu
Copy link
Copy Markdown
Collaborator

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@Qiaolin-Yu Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25017583560
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: 5c98596
Approval: not required (trusted collaborator).

@Qiaolin-Yu Qiaolin-Yu changed the title dsv4-b300-sglang: conc=2048 mega_moe deepep recipe dsv4-b300-sglang: update points Apr 27, 2026
@yhyang201
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25030503512
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: 5c98596
Approval: not required (trusted collaborator).

Both high-conc (CONC=2048/4096) and medium-conc recipes use ep=8 in
the YAML, so EP_SIZE is always "8" for both. The previous if/elif
order meant EP_SIZE=8 matched first, shadowing the CONC=2048/4096
branch entirely. Swap the order so the more specific high-conc check
runs first.
@yhyang201
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25031820374
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: 4f995ac
Approval: not required (trusted collaborator).

- max-running-requests: 4608 → 4352
- swa-full-tokens-ratio: 0.06 → 0.075
- MEGA_MOE_NUM_MAX_TOKENS_PER_RANK: 544 → 8320
- add --decode-log-interval 5
- move SGLANG_LOG_FORWARD_ITERS to conc-2048 only
@yhyang201
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25032591138
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: f596249
Approval: not required (trusted collaborator).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants