chore: bump vllm to 0.15.1 by hjh0119 · Pull Request #7867 · modelscope/ms-swift

hjh0119 · 2026-01-22T13:02:09Z

No description provided.

gemini-code-assist · 2026-01-22T13:02:27Z

Summary of Changes

Hello @hjh0119, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request updates the swift project to ensure compatibility with vLLM version 0.14. The primary focus is on enhancing the WeightSyncWorkerExtension to manage distributed weight synchronization more effectively. This involves a significant refactoring of the class, introducing explicit support for various hardware accelerators like torch.xpu and vllm_ascend, and improving overall vLLM version compatibility for critical utilities. These changes aim to bolster the stability and flexibility of distributed inference pipelines within the system.

Highlights

WeightSyncWorkerExtension Refactor: The WeightSyncWorkerExtension class has been completely refactored to remove its dependency on trl.scripts.vllm_serve.WeightSyncWorkerExtension and introduce a more robust, self-contained implementation for weight synchronization.
Expanded Device and vLLM Version Compatibility: Explicit support for torch.xpu (Intel XPU) and vllm_ascend (Huawei Ascend NPU) devices has been added within the communicator initialization. The get_open_port utility now includes a version check to ensure compatibility with vLLM versions 0.11.0 and above.
Improved Communicator Management: The new WeightSyncWorkerExtension includes dedicated init_communicator and close_communicator methods, providing clearer control over the lifecycle of the stateless process group used for weight updates and ensuring proper resource cleanup.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request aims to add compatibility with vLLM 0.14 by updating swift/pipelines/infer/rollout.py. The main change is a local re-implementation of WeightSyncWorkerExtension to handle API changes in vLLM's distributed communication, including support for CUDA, XPU, and Ascend devices.

My review has identified several critical issues related to the new XPU implementation. The current code will fail on XPU systems due to incorrect device handling for tensor allocation and improper use of broadcast and barrier calls. I have provided specific code suggestions to fix these by adding the necessary conditional logic for XPU devices. Additionally, I've included a medium-severity suggestion to refactor a redundant check for improved code clarity.

gemini-code-assist · 2026-01-22T13:05:48Z

swift/pipelines/infer/rollout.py

+        weight = torch.empty(shape, dtype=dtype, device=self.communicator.device)

        # Use NCCL to broadcast the updated weights from the client (src) to all workers.
-        self._comm.broadcast(weight, src=self.client_rank)
-        self._comm.group.barrier()
+        self.communicator.broadcast(weight, src=self.client_rank)
+        self.communicator.group.barrier()


The current implementation for updating named parameters will fail on XPU devices. The self.communicator for XPU is a torch.distributed.ProcessGroup, which does not have a .device attribute, causing an AttributeError. Additionally, the broadcast and barrier calls are incorrect for this process group type.

You should add conditional logic to handle XPU devices correctly, similar to how it's done in recent versions of trl.

Suggested change

weight = torch.empty(shape, dtype=dtype, device=self.communicator.device)

# Use NCCL to broadcast the updated weights from the client (src) to all workers.

self._comm.broadcast(weight, src=self.client_rank)

self._comm.group.barrier()

self.communicator.broadcast(weight, src=self.client_rank)

self.communicator.group.barrier()

device = self.device if is_torch_xpu_available() else self.communicator.device

weight = torch.empty(shape, dtype=dtype, device=device)

# Use NCCL to broadcast the updated weights from the client (src) to all workers.

if is_torch_xpu_available():

torch.distributed.broadcast(tensor=weight, src=self.client_rank, group=self.communicator)

self.communicator.barrier().wait()

else:

self.communicator.broadcast(weight, src=self.client_rank)

self.communicator.group.barrier()

gemini-code-assist · 2026-01-22T13:05:48Z

swift/pipelines/infer/rollout.py

+        flatten_tensor = torch.empty(flatten_tensor_length, dtype=dtype, device=self.communicator.device)
+        self.communicator.broadcast(flatten_tensor, src=self.client_rank)
+        self.communicator.group.barrier()


Similar to update_named_param, this method has incorrect logic for XPU devices. self.communicator.device will raise an AttributeError, and the broadcast/barrier calls are incorrect for ProcessGroup. This will cause a runtime failure on XPU systems.

device = self.device if is_torch_xpu_available() else self.communicator.device flatten_tensor = torch.empty(flatten_tensor_length, dtype=dtype, device=device) if is_torch_xpu_available(): torch.distributed.broadcast(tensor=flatten_tensor, src=self.client_rank, group=self.communicator) self.communicator.barrier().wait() else: self.communicator.broadcast(flatten_tensor, src=self.client_rank) self.communicator.group.barrier()

gemini-code-assist · 2026-01-22T13:05:49Z

swift/pipelines/infer/rollout.py

+            tensor = torch.empty(shape, dtype=dtype, device=self.communicator.device)
+            self.communicator.broadcast(tensor, src=self.client_rank)


This part of the code has incorrect logic for XPU devices. The device for tensor allocation is wrong, and the broadcast call needs to be conditional to support the ProcessGroup used for XPU.

tensor = torch.empty(shape, dtype=dtype, device=self.device if is_torch_xpu_available() else self.communicator.device) if is_torch_xpu_available(): torch.distributed.broadcast(tensor=tensor, src=self.client_rank, group=self.communicator) else: self.communicator.broadcast(tensor, src=self.client_rank)

gemini-code-assist · 2026-01-22T13:05:50Z

swift/pipelines/infer/rollout.py

            named_params[name] = tensor

-        self._comm.group.barrier()
+        self.communicator.group.barrier()


The barrier call is incorrect for XPU devices. For a ProcessGroup, you should call self.communicator.barrier().wait() instead of self.communicator.group.barrier().

if is_torch_xpu_available(): self.communicator.barrier().wait() else: self.communicator.group.barrier()

gemini-code-assist · 2026-01-22T13:05:50Z

swift/pipelines/infer/rollout.py

+        flatten_tensor = torch.empty(flatten_tensor_length, dtype=dtype, device=self.communicator.device)

-        self._comm.broadcast(flatten_tensor, src=self.client_rank)
-        self._comm.group.barrier()
+        self.communicator.broadcast(flatten_tensor, src=self.client_rank)
+        self.communicator.group.barrier()


This method has the same critical issue with XPU device handling as other update methods. self.communicator.device will fail, and the broadcast and barrier calls are incorrect for the ProcessGroup used in the XPU case.

device = self.device if is_torch_xpu_available() else self.communicator.device flatten_tensor = torch.empty(flatten_tensor_length, dtype=dtype, device=device) if is_torch_xpu_available(): torch.distributed.broadcast(tensor=flatten_tensor, src=self.client_rank, group=self.communicator) self.communicator.barrier().wait() else: self.communicator.broadcast(flatten_tensor, src=self.client_rank) self.communicator.group.barrier()

gemini-code-assist · 2026-01-22T13:05:50Z

swift/pipelines/infer/rollout.py

+        if torch.cuda.is_available() or (is_torch_xpu_available()
+                                         and hasattr(torch.xpu.get_device_properties(self.device), 'uuid')):
+            accelerator_module = torch.xpu if is_torch_xpu_available() else torch.cuda


The check is_torch_xpu_available() is performed twice in this conditional block. You can simplify this logic by using a variable to store the result of the check, which improves readability and avoids redundancy.

Suggested change

if torch.cuda.is_available() or (is_torch_xpu_available()

and hasattr(torch.xpu.get_device_properties(self.device), 'uuid')):

accelerator_module = torch.xpu if is_torch_xpu_available() else torch.cuda

use_xpu_with_uuid = is_torch_xpu_available() and hasattr(torch.xpu.get_device_properties(self.device), 'uuid')

if torch.cuda.is_available() or use_xpu_with_uuid:

accelerator_module = torch.xpu if use_xpu_with_uuid else torch.cuda

hjh0119 · 2026-01-27T02:55:51Z

/gemini review

gemini-code-assist

Code Review

The pull request updates the codebase to be compatible with vLLM 0.14, primarily by re-implementing the WeightSyncWorkerExtension class and adjusting related communication protocols. This involves removing the _check_trl_version method and the client_device_uuid field from communication requests, and updating the weight synchronization logic to use the new communicator attribute. The changes are well-aligned with the stated goal of vLLM 0.14 compatibility. However, there is a critical issue in the re-implemented WeightSyncWorkerExtension where self.device is accessed before being defined, which needs to be addressed.

gemini-code-assist · 2026-01-27T03:01:06Z

swift/pipelines/infer/rollout.py

+            import torch_npu
+            torch_npu.npu.set_device(device)
+        else:
+            device = self.device


The device attribute is accessed via self.device but is not defined within the WeightSyncWorkerExtension class. This will lead to an AttributeError. It should likely be accessed from self.model_runner.device, as model_runner is an attribute typically available in worker extensions and holds the device information.

Suggested change

device = self.device

device = self.model_runner.device

compatible vllm 0.14

2be8820

gemini-code-assist bot reviewed Jan 22, 2026

View reviewed changes

hjh0119 added 2 commits January 23, 2026 10:35

clean

0e58000

clean

da95963

gemini-code-assist bot reviewed Jan 27, 2026

View reviewed changes

hjh0119 added 2 commits January 27, 2026 11:34

move chunk_list

016a5d3

Merge branch 'main' into vllm014

76e7734

hjh0119 changed the title ~~compatible with vllm 0.14~~ chore: bump vllm to 0.15.1 Feb 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: bump vllm to 0.15.1#7867

chore: bump vllm to 0.15.1#7867
hjh0119 wants to merge 5 commits intomodelscope:mainfrom
hjh0119:vllm014

hjh0119 commented Jan 22, 2026

Uh oh!

gemini-code-assist bot commented Jan 22, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 22, 2026

Uh oh!

gemini-code-assist bot Jan 22, 2026

Uh oh!

gemini-code-assist bot Jan 22, 2026

Uh oh!

gemini-code-assist bot Jan 22, 2026

Uh oh!

gemini-code-assist bot Jan 22, 2026

Uh oh!

gemini-code-assist bot Jan 22, 2026

Uh oh!

hjh0119 commented Jan 27, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		tensor = torch.empty(shape, dtype=dtype, device=self.communicator.device)
		self.communicator.broadcast(tensor, src=self.client_rank)

Conversation

hjh0119 commented Jan 22, 2026

Uh oh!

gemini-code-assist bot commented Jan 22, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

hjh0119 commented Jan 27, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant