From 7d97c840cc9e7c3b3f564e448cddd3680dcb512b Mon Sep 17 00:00:00 2001 From: 0oshowero0 Date: Mon, 30 Mar 2026 10:10:10 +0800 Subject: [PATCH 1/3] update readme Signed-off-by: 0oshowero0 --- scripts/performance_test/README_PERFTEST.md | 68 +++++++++++++++++---- 1 file changed, 57 insertions(+), 11 deletions(-) diff --git a/scripts/performance_test/README_PERFTEST.md b/scripts/performance_test/README_PERFTEST.md index 1b5ddc6a..8e150986 100644 --- a/scripts/performance_test/README_PERFTEST.md +++ b/scripts/performance_test/README_PERFTEST.md @@ -42,19 +42,63 @@ python perftest.py \ | `--head_node_ip` | Head node IP address | - | Yes | | `--worker_node_ip` | Worker node IP address (required for Yuanrong) | None | No | | `--output_csv` | Path to output CSV file | None | No | +| `--use_complex_case` | Use complex test case with nested tensors and NonTensorStack fields | False | No | ## Backend Configuration The script reads the backend configuration directly from the provided `--backend_config` YAML file. The backend type is determined by `backend.storage_backend` in the config file. When `--backend` is specified, it overrides the value in the config. -For device support of each backend: -- `SimpleStorage`: `cpu` -- `Yuanrong`: `cpu`, `npu` -- `MooncakeStore`: `cpu`, `gpu` +### SimpleStorage Configuration -## Test Data Format +```yaml +backend: + storage_backend: SimpleStorage + SimpleStorage: + total_storage_size: 100000 + num_data_storage_units: 16 +``` + +### Yuanrong Configuration + +```yaml +backend: + storage_backend: Yuanrong + Yuanrong: + port: 31501 + enable_yr_npu_transport: true +``` + +For Yuanrong backend, writer runs on the head node and reader runs on the worker node. `--worker_node_ip` is required. + +### MooncakeStore Configuration + +```yaml +backend: + storage_backend: MooncakeStore + MooncakeStore: + auto_init: true + metadata_server: localhost:50050 + master_server_address: localhost:50051 + local_hostname: "" + protocol: rdma + global_segment_size: 86294967296 + local_buffer_size: 86294967296 + device_name: "" +``` + +## Test Scenarios + +### Simple Test Case (Default) + +When `--use_complex_case` is **not** specified (default), the test creates a `TensorDict` with only regular tensors: -The test case creates a `TensorDict` with three types of fields to simulate real training batches: +- **Regular tensors**: Shape `(batch_size, seq_length)`, float32. + +Each regular tensor field size = `batch_size × seq_length × 4` bytes. + +### Complex Test Case + +When `--use_complex_case` is specified, the test creates a `TensorDict` with three types of fields to simulate real training batches: 1. **Regular tensors**: Shape `(batch_size, seq_length)`, float32. 2. **Nested tensors** (non-NPU devices): Variable-length ragged sequences with lengths forming an arithmetic progression from 1 to `seq_length`. Average length ≈ `seq_length / 2`, so each nested field is roughly half the size of a regular field. @@ -73,10 +117,6 @@ Each iteration performs a PUT → LIST → GET → DELETE cycle via TransferQueu The test runs `--num_test_iterations` iterations. Data creation only happens in the first iteration; subsequent iterations reuse the same TensorDict to isolate transfer overhead. -## Yuanrong Backend - -For Yuanrong backend, writer runs on the head node and reader runs on the worker node. `--worker_node_ip` is required. - ## Running Full Test Suite The `run_perf_test.sh` script automates the full test suite across all backends and data sizes, then generates a comparison chart: @@ -130,12 +170,18 @@ After running the tests, `draw_figure.py` reads all CSV files from `results/` an ## Examples -### SimpleStorage backend +### SimpleStorage backend (simple case) ```bash python perftest.py --backend_config=perftest_config.yaml --backend=SimpleStorage \ --head_node_ip=192.168.0.1 ``` +### SimpleStorage backend (complex case) +```bash +python perftest.py --backend_config=perftest_config.yaml --backend=SimpleStorage \ + --head_node_ip=192.168.0.1 --use_complex_case +``` + ### Yuanrong backend (inter-node) ```bash python perftest.py --backend_config=perftest_config.yaml --backend=Yuanrong \ From d9750451f5bc3a45fba910dac381a5332f494ebf Mon Sep 17 00:00:00 2001 From: 0oshowero0 Date: Mon, 30 Mar 2026 10:23:28 +0800 Subject: [PATCH 2/3] update Signed-off-by: 0oshowero0 --- README.md | 23 ++++++++++------------- transfer_queue/version/version | 2 +- 2 files changed, 11 insertions(+), 14 deletions(-) diff --git a/README.md b/README.md index 5b88a73a..952b34f1 100644 --- a/README.md +++ b/README.md @@ -76,7 +76,7 @@ Currently, we support the following storage backends: - SimpleStorage: A basic CPU memory storage with minimal data format constraints and easy usability. - [Yuanrong](https://gitee.com/openeuler/yuanrong-datasystem) (beta, [#PR107](https://github.com/TransferQueue/TransferQueue/pull/107), [#PR96](https://github.com/TransferQueue/TransferQueue/pull/96)): An Ascend native data system that provides hierarchical storage interfaces including HBM/DRAM/SSD. -- [MooncakeStore](https://github.com/kvcache-ai/Mooncake) (alpha, [#PR162](https://github.com/TransferQueue/TransferQueue/pull/162)): A high-performance, KV-based hierarchical storage that supports RDMA transport between GPU and DRAM. +- [MooncakeStore](https://github.com/kvcache-ai/Mooncake) (beta, [#PR162](https://github.com/TransferQueue/TransferQueue/pull/162)): A high-performance, KV-based hierarchical storage that supports RDMA transport between GPU and DRAM. - [RayRDT](https://docs.ray.io/en/master/ray-core/direct-transport.html) (alpha, [#PR167](https://github.com/TransferQueue/TransferQueue/pull/167)): Ray's new feature that allows Ray to store and pass objects directly between Ray actors. Among them, `SimpleStorageUnit` serves as our default storage backend, coordinated by the `AsyncSimpleStorageManager` class. Each storage unit can be deployed on a separate node, allowing for distributed data management. @@ -121,6 +121,8 @@ To simplify the usage of TransferQueue, we have provided a Redis-style high-leve - **Metadata Tags**: Lightweight metadata for status tracking - **Pluggable Backends**: Supports multiple backends +Refer to [tutorials/basic.ipynb](https://github.com/Ascend/TransferQueue/blob/main/tutorial/basic.ipynb) and [tutorials/02_kv_interface.py](https://github.com/Ascend/TransferQueue/blob/main/tutorial/02_kv_interface.py) for detailed usage examples. + #### StreamingDataLoader API Designed as a drop-in replacement for the standard PyTorch `DataLoader`, this API allows each rank to automatically consume data without single-controller intervention. @@ -147,17 +149,12 @@ Developers can leverage `TransferQueueClient` directly to implement advanced fea #### verl The primary motivation for integrating TransferQueue to verl now is to **alleviate the data transfer bottleneck of the single controller `RayPPOTrainer`**. Currently, all `DataProto` objects must be routed through `RayPPOTrainer`, resulting in a single point bottleneck of the whole post-training system. -![verl_dataflow_DataProto](https://github.com/TransferQueue/community_doc/blob/main/docs/verl_workflow.jpeg?raw=true) - -Leveraging TransferQueue, we separate experience data transfer from metadata dispatch by - -- Replacing `DataProto` with `BatchMeta` (metadata) and `TensorDict` (actual data) structures -- Preserving verl's original Dispatch/Collect logic via BatchMeta (maintaining single-controller debuggability) -- Accelerating data transfer by TransferQueue's distributed storage units +

+ +

-![verl_dataflow_TransferQueue](https://github.com/TransferQueue/community_doc/blob/main/docs/verl_workflow_with_tq.jpeg?raw=true) +Official integration to verl is available at [verl/pulls/5401](https://github.com/verl-project/verl/pull/5401), with design doc at [[RFC] PPOTrainer with TransferQueue Integration](https://github.com/verl-project/verl/issues/5400). You may also refer to our [recipe](https://github.com/Ascend/TransferQueue/blob/main/recipe/simple_use_case/single_controller_demo.py), where we mimic the verl usage in a high-level manner. -You may refer to the [recipe](https://github.com/Ascend/TransferQueue/tree/dev/recipe/simple_use_case), where we mimic the verl usage in both async & sync scenarios. Official integration to verl is also available now at [verl/pulls/3649](https://github.com/volcengine/verl/pull/3649) (with subsequent PRs to further optimize the integration). ### Disaggregated Example @@ -216,11 +213,11 @@ pip install TransferQueue

-> Note: The above benchmark for TransferQueue is based on our naive `SimpleStorage` backend. By introducing high-performance storage backends and optimizing serialization/deserialization, we expect to achieve even better performance. Warmly welcome contributions from the community! +> Note: Optimization for MooncakeStore and other backends are still in process. Warmly welcome contributions from the community! -For detailed performance benchmarks, please refer to [this blog](https://www.yuque.com/haomingzi-lfse7/hlx5g0/tml8ke0zkgn6roey?singleDoc#). +For detailed performance benchmarks, please refer to [this blog](https://www.yuque.com/haomingzi-lfse7/lhp4el/tml8ke0zkgn6roey?singleDoc#). -We also provide a [stress test report](https://www.yuque.com/haomingzi-lfse7/hlx5g0/ydbwgo5k2umaag78?singleDoc#) that demonstrates **768 concurrent clients writing 1.4 TB of data** into TransferQueue across 4 nodes. The system remains stable without any crashes or data loss, achieving 80% bandwidth. +We also provide a [stress test report](https://www.yuque.com/haomingzi-lfse7/lhp4el/mt0vedqy7c337pgg?singleDoc#) that demonstrates more than **8192 concurrent clients writing 2 TB of data** into TransferQueue across 4 nodes. The system remains stable without any crashes or data loss.

🛠️ Customize TransferQueue

diff --git a/transfer_queue/version/version b/transfer_queue/version/version index fb7ee7f0..c946ee61 100644 --- a/transfer_queue/version/version +++ b/transfer_queue/version/version @@ -1 +1 @@ -0.1.6.dev0 +0.1.6 From 33bdcf88bf404a4710d5d9f9280ec994df391eac Mon Sep 17 00:00:00 2001 From: 0oshowero0 Date: Wed, 1 Apr 2026 16:09:11 +0800 Subject: [PATCH 3/3] update mooncake version Signed-off-by: 0oshowero0 --- pyproject.toml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pyproject.toml b/pyproject.toml index 1fba227f..3a067a18 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -118,7 +118,7 @@ yuanrong = [ "openyuanrong-datasystem" ] mooncake = [ - "mooncake-transfer-engine" + "mooncake-transfer-engine==0.3.10.post1" ] # If you need to mimic `package_dir={'': '.'}`: