Skip to content

[trainer, fully_async] Adapt vLLM 0.19+ for Ascend NPU#6881

Open
fh188 wants to merge 1 commit into
verl-project:release/v0.8.0from
fh188:release/v0.8.0
Open

[trainer, fully_async] Adapt vLLM 0.19+ for Ascend NPU#6881
fh188 wants to merge 1 commit into
verl-project:release/v0.8.0from
fh188:release/v0.8.0

Conversation

@fh188

@fh188 fh188 commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

[trainer, fully_async] Adapt vLLM 0.19+ for Ascend NPU

What does this PR do?

This PR mainly addresses compatibility issues with vLLM 0.19.0 and later in NPU scenarios.

The main changes include:

For vLLM >= 0.19.0, continue disabling flash attention in RotaryEmbedding to avoid compatibility issues on NPU.
Wrap FusedMoE.weight_loader to keep the MoE weight loading logic compatible with newer vLLM versions.
Add a new create_tcp_store helper to create TCPStore in a unified way, with support for passing an existing listen_socket.
When listen_socket is provided, use detach() to transfer the socket file descriptor to TCPStore, and correctly close the fd on exceptions to avoid resource leaks.
Update the StatelessProcessGroup creation logic based on the vLLM version:
For vLLM >= 0.19.0: create the store through the new create_tcp_store helper.
For vLLM < 0.19.0: keep the original logic and continue passing the socket.
Add the VLLM_ASCEND_AUTO_DETECT_QUANTIZATION=0 environment variable to disable vLLM Ascend automatic quantization detection, avoiding potential misdetection or compatibility issues.

Overall, this PR improves the startup, communication, and weight-loading compatibility of verl with vLLM 0.19.0+ in Ascend NPU scenarios.

Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, fully_async, one_step_off
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

fix fully async bug
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces compatibility updates for vLLM version 0.19.0 and above, including conditional handling of StatelessProcessGroup and TCPStore creation in verl/utils/distributed.py, and applying the NPU rotary embedding patch for newer vLLM versions in verl/utils/vllm/npu_vllm_patch.py. Additionally, it sets the ASCEND_RT_VISIBLE_DEVICES environment variable when launching servers in vllm_async_server.py. Feedback on the changes points out a bug in create_tcp_store where calling socket.close(listen_fd) will raise an AttributeError, suggesting the use of os.close(listen_fd) instead to prevent file descriptor leaks.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread verl/utils/distributed.py
Comment on lines +117 to +119
except Exception:
socket.close(listen_fd)
raise

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using 'socket.close(listen_fd)' will raise an 'AttributeError' because the 'socket' module does not have a 'close' function. Since 'listen_fd' is a raw integer file descriptor obtained via 'listen_socket.detach()', you should use 'os.close(listen_fd)' to properly close it and avoid leaking the file descriptor.

Suggested change
except Exception:
socket.close(listen_fd)
raise
except Exception:
os.close(listen_fd)
raise

@fh188 fh188 changed the title fix fully async bug [trainer, fully_async] Adapt vLLM 0.19+ for Ascend NPU Jun 29, 2026
@wuxibin89

Copy link
Copy Markdown
Collaborator

Submit PR to main first then cherry-pick to release branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants