Skip to content

dsv4 B200 MTP SGLang launch#1139

Closed
yhyang201 wants to merge 14 commits intoSemiAnalysisAI:mainfrom
yhyang201:chore/dsv4-sgl-b200
Closed

dsv4 B200 MTP SGLang launch#1139
yhyang201 wants to merge 14 commits intoSemiAnalysisAI:mainfrom
yhyang201:chore/dsv4-sgl-b200

Conversation

@yhyang201
Copy link
Copy Markdown
Collaborator

Summary

  • Switch dsv4 B200 SGLang launch from python3 -m sglang.launch_server to sglang serve
  • Add EAGLE speculative decoding (--speculative-num-steps 3, --speculative-eagle-topk 1, --speculative-num-draft-tokens 4)
  • Add --chunked-prefill-size 4096 and --disable-flashinfer-autotune
  • Add env flags: SGLANG_OPT_USE_CUSTOM_ALL_REDUCE_V2=1, SGLANG_OPT_USE_TOPK_V2=1
  • Keep --moe-runner-backend flashinfer_mxfp4, port 8888, benchmark backend vllm unchanged

Based on #1131

Test plan

  • Sweep run produces results for 1k/1k and 8k/1k ISL/OSL

🤖 Generated with Claude Code

cquil11 and others added 11 commits April 24, 2026 01:10
Adds the DeepSeek-V4-Flash B200 SGLang recipe from
https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-V4.
Prefix caching and speculative decoding are disabled for baseline numbers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Uses deepseek-ai/DeepSeek-V4-Pro with tp=8, ep=8, dp-attention enabled
and sweep concurrency ranges aligned with dsv4-fp4-b200-vllm (4-1024 at
1k/1k, 4-512 at 8k/1k). Script now passes --enable-dp-attention when
DP_ATTENTION=true and sets --mem-fraction-static per the Pro recipe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Server launch now mirrors the DeepSeek-V4-Pro command from
https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-V4:
--tp N, --moe-runner-backend flashinfer_mxfp4, --mem-fraction-static
0.82, SGLANG_JIT_DEEPGEMM_PRECOMPILE=0. Speculative decoding omitted
and --disable-radix-cache added per the no-spec / no-prefix-cache
baseline. YAML search-space drops ep/dp-attn to tp=8, ep=1.

Also syncs runners/launch_b200-dgxc-slurm.sh with the HF cache mount
path from origin/claude/add-dsv4-fp4-b200-vllm so both PRs stay in
agreement on runner layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The deepseek-v4-blackwell image doesn't expose sglang via system
python3, so the module import fails:

  /usr/bin/python3: Error while finding module specification for
  'sglang.launch_server' (ModuleNotFoundError: No module named 'sglang')

Switch to the `sglang serve` entrypoint that the cookbook uses; the
CLI resolves the correct interpreter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The lmsysorg/sglang:deepseek-v4-blackwell image installs sglang editable
at /workspace/sglang/python — unlike every prior sglang tag which uses
/sgl-workspace/sglang. Our $GITHUB_WORKSPACE:/workspace/ bind-mount
masks that directory, breaking `import sglang`.

Conditionally mount at /ix for this image only and make the dsv4
benchmark script use $PWD for server/metrics/result paths so it works
regardless of the mount target. All other configs still mount at
/workspace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The lmsysorg/sglang:deepseek-v4-blackwell image installs sglang editable at
/workspace/sglang/python, which our $GITHUB_WORKSPACE:/workspace/ bind-mount
masks. Temporary one-line workaround: pip install --no-deps sglang in the
benchmark script to restore a non-editable copy in site-packages. Runner
reverted to the standard /workspace mount. Marked with a TODO(Cam) for
the proper fix once lmsys publishes an image that doesn't editable-install
under /workspace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
'pip install --no-deps sglang' is a no-op when sglang is already
registered in site-packages -- even if the underlying editable path
is missing -- so the prior workaround never actually swapped in a
working install. Uninstall the broken egg-link first, then reinstall.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Back to the proper mount fix so we use the same
'PYTHONNOUSERSITE=1 python3 -m sglang.launch_server ...' invocation as
every other sglang single_node script. Conditional mount target keeps
the blast radius to this one config.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The image ENV pins CUDA_VISIBLE_DEVICES=4,5,6,7 (leftover from lmsys's
internal testing). With --no-container-entrypoint it isn't cleared, so
the container only sees 4 GPUs and TP=8 fails with
  torch.AcceleratorError: CUDA error: invalid device ordinal

Unset it at the top of the script so Slurm's 8-GPU allocation is visible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Only patched launch_b200-dgxc-slurm.sh last time; the b200-nb runner
still had the default $GITHUB_WORKSPACE:/workspace/ mount, which
masks the deepseek-v4-blackwell image's /workspace/sglang editable
install. Most B200 jobs in this repo run on b200-nb.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

…ding

Only replace the sglang launch command, keep all surrounding logic intact.
Add PYTHONNOUSERSITE=1, SGLANG_OPT_USE_CUSTOM_ALL_REDUCE_V2=1,
SGLANG_OPT_USE_TOPK_V2=1 env prefixes. Switch to sglang serve with
EAGLE speculative decoding (3 steps, topk=1, 4 draft tokens),
chunked prefill 4096, and disable-flashinfer-autotune.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Oseltamivir
Copy link
Copy Markdown
Collaborator

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b200-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@Oseltamivir Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24891856324
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b200-sglang
Pinned ref: c8b48b5
Approval: not required (trusted collaborator).

@yhyang201 yhyang201 changed the title Update dsv4 B200 SGLang launch: sglang serve + EAGLE speculative decoding update dsv4 B200 SGLang launch [low latency] Apr 24, 2026
@yhyang201 yhyang201 changed the title update dsv4 B200 SGLang launch [low latency] update dsv4 B200 SGLang launch Apr 24, 2026
@functionstackx functionstackx changed the title update dsv4 B200 SGLang launch dsv4 B200 MTP SGLang launch Apr 24, 2026
seq-len-configs:
- isl: 1024
osl: 1024
search-space:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add spec-decoding = mtp here

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@yhyang201 yhyang201 force-pushed the chore/dsv4-sgl-b200 branch from 7bea5e4 to c8b48b5 Compare April 24, 2026 14:10
yhyang201 and others added 2 commits April 24, 2026 22:14
EAGLE speculative decoding is enabled in the benchmark script, so
the YAML search-space entries need spec-decoding: "mtp" to ensure
correct classification in config generation and eval selection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy of dsv4_fp4_b200.sh with --use-chat-template added to
run_benchmark_serving, as required by AGENTS.md for MTP scripts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Apr 24, 2026

Superseded by #1145 (same changes, internal branch).

@cquil11 cquil11 closed this Apr 24, 2026
cquil11 added a commit that referenced this pull request Apr 24, 2026
…env vars

Both flags were on the sglang serve invocation in the original PR (#1139)
and got dropped when the script was restructured to mirror the baseline
3-recipe layout. Re-add as exports so they apply across all recipes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

4 participants