Unable to match Runai Distributed Bandwidth on VLLM

We have 400gbit/s networked VAST storage and generally see ~1 minute weight loading times of `kimi-k2.5` with Runai streamer and the following settings:

```
RUNAI_STREAMER_DIST: 1
RUNAI_STREAMER_CHUNK_BYTESIZE: 4194304
--model-loader-extra-config={{\"concurrency\": 140, \"distributed\": true}}
```

I tried instanttensor on VLLM and noticed it takes around ~2:30 min

I tested both with Kimi-k2.5, 8xH200, on a networked VAST fileshare mounted via NFS `readahead: 4096, nconnect: 16, rdma: true`. There is no persisted storage, loading directly into mounted RAM disk

Is there anything else that would need configuring?
I also tried with `INSTANTTENSOR_USE_CUFILE`. I am using the latest lightly VLLM cu13 x86 image with instanttensor installed into it

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to match Runai Distributed Bandwidth on VLLM #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unable to match Runai Distributed Bandwidth on VLLM #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions