From 7d97c840cc9e7c3b3f564e448cddd3680dcb512b Mon Sep 17 00:00:00 2001
From: 0oshowero0
Date: Mon, 30 Mar 2026 10:10:10 +0800
Subject: [PATCH 1/3] update readme
Signed-off-by: 0oshowero0
---
scripts/performance_test/README_PERFTEST.md | 68 +++++++++++++++++----
1 file changed, 57 insertions(+), 11 deletions(-)
diff --git a/scripts/performance_test/README_PERFTEST.md b/scripts/performance_test/README_PERFTEST.md
index 1b5ddc6a..8e150986 100644
--- a/scripts/performance_test/README_PERFTEST.md
+++ b/scripts/performance_test/README_PERFTEST.md
@@ -42,19 +42,63 @@ python perftest.py \
| `--head_node_ip` | Head node IP address | - | Yes |
| `--worker_node_ip` | Worker node IP address (required for Yuanrong) | None | No |
| `--output_csv` | Path to output CSV file | None | No |
+| `--use_complex_case` | Use complex test case with nested tensors and NonTensorStack fields | False | No |
## Backend Configuration
The script reads the backend configuration directly from the provided `--backend_config` YAML file. The backend type is determined by `backend.storage_backend` in the config file. When `--backend` is specified, it overrides the value in the config.
-For device support of each backend:
-- `SimpleStorage`: `cpu`
-- `Yuanrong`: `cpu`, `npu`
-- `MooncakeStore`: `cpu`, `gpu`
+### SimpleStorage Configuration
-## Test Data Format
+```yaml
+backend:
+ storage_backend: SimpleStorage
+ SimpleStorage:
+ total_storage_size: 100000
+ num_data_storage_units: 16
+```
+
+### Yuanrong Configuration
+
+```yaml
+backend:
+ storage_backend: Yuanrong
+ Yuanrong:
+ port: 31501
+ enable_yr_npu_transport: true
+```
+
+For Yuanrong backend, writer runs on the head node and reader runs on the worker node. `--worker_node_ip` is required.
+
+### MooncakeStore Configuration
+
+```yaml
+backend:
+ storage_backend: MooncakeStore
+ MooncakeStore:
+ auto_init: true
+ metadata_server: localhost:50050
+ master_server_address: localhost:50051
+ local_hostname: ""
+ protocol: rdma
+ global_segment_size: 86294967296
+ local_buffer_size: 86294967296
+ device_name: ""
+```
+
+## Test Scenarios
+
+### Simple Test Case (Default)
+
+When `--use_complex_case` is **not** specified (default), the test creates a `TensorDict` with only regular tensors:
-The test case creates a `TensorDict` with three types of fields to simulate real training batches:
+- **Regular tensors**: Shape `(batch_size, seq_length)`, float32.
+
+Each regular tensor field size = `batch_size × seq_length × 4` bytes.
+
+### Complex Test Case
+
+When `--use_complex_case` is specified, the test creates a `TensorDict` with three types of fields to simulate real training batches:
1. **Regular tensors**: Shape `(batch_size, seq_length)`, float32.
2. **Nested tensors** (non-NPU devices): Variable-length ragged sequences with lengths forming an arithmetic progression from 1 to `seq_length`. Average length ≈ `seq_length / 2`, so each nested field is roughly half the size of a regular field.
@@ -73,10 +117,6 @@ Each iteration performs a PUT → LIST → GET → DELETE cycle via TransferQueu
The test runs `--num_test_iterations` iterations. Data creation only happens in the first iteration; subsequent iterations reuse the same TensorDict to isolate transfer overhead.
-## Yuanrong Backend
-
-For Yuanrong backend, writer runs on the head node and reader runs on the worker node. `--worker_node_ip` is required.
-
## Running Full Test Suite
The `run_perf_test.sh` script automates the full test suite across all backends and data sizes, then generates a comparison chart:
@@ -130,12 +170,18 @@ After running the tests, `draw_figure.py` reads all CSV files from `results/` an
## Examples
-### SimpleStorage backend
+### SimpleStorage backend (simple case)
```bash
python perftest.py --backend_config=perftest_config.yaml --backend=SimpleStorage \
--head_node_ip=192.168.0.1
```
+### SimpleStorage backend (complex case)
+```bash
+python perftest.py --backend_config=perftest_config.yaml --backend=SimpleStorage \
+ --head_node_ip=192.168.0.1 --use_complex_case
+```
+
### Yuanrong backend (inter-node)
```bash
python perftest.py --backend_config=perftest_config.yaml --backend=Yuanrong \
From d9750451f5bc3a45fba910dac381a5332f494ebf Mon Sep 17 00:00:00 2001
From: 0oshowero0
Date: Mon, 30 Mar 2026 10:23:28 +0800
Subject: [PATCH 2/3] update
Signed-off-by: 0oshowero0
---
README.md | 23 ++++++++++-------------
transfer_queue/version/version | 2 +-
2 files changed, 11 insertions(+), 14 deletions(-)
diff --git a/README.md b/README.md
index 5b88a73a..952b34f1 100644
--- a/README.md
+++ b/README.md
@@ -76,7 +76,7 @@ Currently, we support the following storage backends:
- SimpleStorage: A basic CPU memory storage with minimal data format constraints and easy usability.
- [Yuanrong](https://gitee.com/openeuler/yuanrong-datasystem) (beta, [#PR107](https://github.com/TransferQueue/TransferQueue/pull/107), [#PR96](https://github.com/TransferQueue/TransferQueue/pull/96)): An Ascend native data system that provides hierarchical storage interfaces including HBM/DRAM/SSD.
-- [MooncakeStore](https://github.com/kvcache-ai/Mooncake) (alpha, [#PR162](https://github.com/TransferQueue/TransferQueue/pull/162)): A high-performance, KV-based hierarchical storage that supports RDMA transport between GPU and DRAM.
+- [MooncakeStore](https://github.com/kvcache-ai/Mooncake) (beta, [#PR162](https://github.com/TransferQueue/TransferQueue/pull/162)): A high-performance, KV-based hierarchical storage that supports RDMA transport between GPU and DRAM.
- [RayRDT](https://docs.ray.io/en/master/ray-core/direct-transport.html) (alpha, [#PR167](https://github.com/TransferQueue/TransferQueue/pull/167)): Ray's new feature that allows Ray to store and pass objects directly between Ray actors.
Among them, `SimpleStorageUnit` serves as our default storage backend, coordinated by the `AsyncSimpleStorageManager` class. Each storage unit can be deployed on a separate node, allowing for distributed data management.
@@ -121,6 +121,8 @@ To simplify the usage of TransferQueue, we have provided a Redis-style high-leve
- **Metadata Tags**: Lightweight metadata for status tracking
- **Pluggable Backends**: Supports multiple backends
+Refer to [tutorials/basic.ipynb](https://github.com/Ascend/TransferQueue/blob/main/tutorial/basic.ipynb) and [tutorials/02_kv_interface.py](https://github.com/Ascend/TransferQueue/blob/main/tutorial/02_kv_interface.py) for detailed usage examples.
+
#### StreamingDataLoader API
Designed as a drop-in replacement for the standard PyTorch `DataLoader`, this API allows each rank to automatically consume data without single-controller intervention.
@@ -147,17 +149,12 @@ Developers can leverage `TransferQueueClient` directly to implement advanced fea
#### verl
The primary motivation for integrating TransferQueue to verl now is to **alleviate the data transfer bottleneck of the single controller `RayPPOTrainer`**. Currently, all `DataProto` objects must be routed through `RayPPOTrainer`, resulting in a single point bottleneck of the whole post-training system.
-
-
-Leveraging TransferQueue, we separate experience data transfer from metadata dispatch by
-
-- Replacing `DataProto` with `BatchMeta` (metadata) and `TensorDict` (actual data) structures
-- Preserving verl's original Dispatch/Collect logic via BatchMeta (maintaining single-controller debuggability)
-- Accelerating data transfer by TransferQueue's distributed storage units
+
+
+
-
+Official integration to verl is available at [verl/pulls/5401](https://github.com/verl-project/verl/pull/5401), with design doc at [[RFC] PPOTrainer with TransferQueue Integration](https://github.com/verl-project/verl/issues/5400). You may also refer to our [recipe](https://github.com/Ascend/TransferQueue/blob/main/recipe/simple_use_case/single_controller_demo.py), where we mimic the verl usage in a high-level manner.
-You may refer to the [recipe](https://github.com/Ascend/TransferQueue/tree/dev/recipe/simple_use_case), where we mimic the verl usage in both async & sync scenarios. Official integration to verl is also available now at [verl/pulls/3649](https://github.com/volcengine/verl/pull/3649) (with subsequent PRs to further optimize the integration).
### Disaggregated Example
@@ -216,11 +213,11 @@ pip install TransferQueue
-> Note: The above benchmark for TransferQueue is based on our naive `SimpleStorage` backend. By introducing high-performance storage backends and optimizing serialization/deserialization, we expect to achieve even better performance. Warmly welcome contributions from the community!
+> Note: Optimization for MooncakeStore and other backends are still in process. Warmly welcome contributions from the community!
-For detailed performance benchmarks, please refer to [this blog](https://www.yuque.com/haomingzi-lfse7/hlx5g0/tml8ke0zkgn6roey?singleDoc#).
+For detailed performance benchmarks, please refer to [this blog](https://www.yuque.com/haomingzi-lfse7/lhp4el/tml8ke0zkgn6roey?singleDoc#).
-We also provide a [stress test report](https://www.yuque.com/haomingzi-lfse7/hlx5g0/ydbwgo5k2umaag78?singleDoc#) that demonstrates **768 concurrent clients writing 1.4 TB of data** into TransferQueue across 4 nodes. The system remains stable without any crashes or data loss, achieving 80% bandwidth.
+We also provide a [stress test report](https://www.yuque.com/haomingzi-lfse7/lhp4el/mt0vedqy7c337pgg?singleDoc#) that demonstrates more than **8192 concurrent clients writing 2 TB of data** into TransferQueue across 4 nodes. The system remains stable without any crashes or data loss.
🛠️ Customize TransferQueue
diff --git a/transfer_queue/version/version b/transfer_queue/version/version
index fb7ee7f0..c946ee61 100644
--- a/transfer_queue/version/version
+++ b/transfer_queue/version/version
@@ -1 +1 @@
-0.1.6.dev0
+0.1.6
From 33bdcf88bf404a4710d5d9f9280ec994df391eac Mon Sep 17 00:00:00 2001
From: 0oshowero0
Date: Wed, 1 Apr 2026 16:09:11 +0800
Subject: [PATCH 3/3] update mooncake version
Signed-off-by: 0oshowero0
---
pyproject.toml | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/pyproject.toml b/pyproject.toml
index 1fba227f..3a067a18 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -118,7 +118,7 @@ yuanrong = [
"openyuanrong-datasystem"
]
mooncake = [
- "mooncake-transfer-engine"
+ "mooncake-transfer-engine==0.3.10.post1"
]
# If you need to mimic `package_dir={'': '.'}`: