[trainer, fully_async] Adapt vLLM 0.19+ for Ascend NPU#6881
Conversation
fix fully async bug
|
|
There was a problem hiding this comment.
Code Review
This pull request introduces compatibility updates for vLLM version 0.19.0 and above, including conditional handling of StatelessProcessGroup and TCPStore creation in verl/utils/distributed.py, and applying the NPU rotary embedding patch for newer vLLM versions in verl/utils/vllm/npu_vllm_patch.py. Additionally, it sets the ASCEND_RT_VISIBLE_DEVICES environment variable when launching servers in vllm_async_server.py. Feedback on the changes points out a bug in create_tcp_store where calling socket.close(listen_fd) will raise an AttributeError, suggesting the use of os.close(listen_fd) instead to prevent file descriptor leaks.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| except Exception: | ||
| socket.close(listen_fd) | ||
| raise |
There was a problem hiding this comment.
Using 'socket.close(listen_fd)' will raise an 'AttributeError' because the 'socket' module does not have a 'close' function. Since 'listen_fd' is a raw integer file descriptor obtained via 'listen_socket.detach()', you should use 'os.close(listen_fd)' to properly close it and avoid leaking the file descriptor.
| except Exception: | |
| socket.close(listen_fd) | |
| raise | |
| except Exception: | |
| os.close(listen_fd) | |
| raise |
|
Submit PR to main first then cherry-pick to release branch. |
[trainer, fully_async] Adapt vLLM 0.19+ for Ascend NPU
What does this PR do?
This PR mainly addresses compatibility issues with vLLM 0.19.0 and later in NPU scenarios.
The main changes include:
For vLLM >= 0.19.0, continue disabling flash attention in RotaryEmbedding to avoid compatibility issues on NPU.
Wrap FusedMoE.weight_loader to keep the MoE weight loading logic compatible with newer vLLM versions.
Add a new create_tcp_store helper to create TCPStore in a unified way, with support for passing an existing listen_socket.
When listen_socket is provided, use detach() to transfer the socket file descriptor to TCPStore, and correctly close the fd on exceptions to avoid resource leaks.
Update the StatelessProcessGroup creation logic based on the vLLM version:
For vLLM >= 0.19.0: create the store through the new create_tcp_store helper.
For vLLM < 0.19.0: keep the original logic and continue passing the socket.
Add the VLLM_ASCEND_AUTO_DETECT_QUANTIZATION=0 environment variable to disable vLLM Ascend automatic quantization detection, avoiding potential misdetection or compatibility issues.
Overall, this PR improves the startup, communication, and weight-loading compatibility of verl with vLLM 0.19.0+ in Ascend NPU scenarios.
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,veomni,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,fully_async,one_step_off,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)recipesubmodule, please also update the reference to the submodule commit viagit submodule update --remoteorcd recipe && git pull origin main.