-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
We have 400gbit/s networked VAST storage and generally see ~1 minute weight loading times of kimi-k2.5 with Runai streamer and the following settings:
RUNAI_STREAMER_DIST: 1
RUNAI_STREAMER_CHUNK_BYTESIZE: 4194304
--model-loader-extra-config={{\"concurrency\": 140, \"distributed\": true}}
I tried instanttensor on VLLM and noticed it takes around ~2:30 min
I tested both with Kimi-k2.5, 8xH200, on a networked VAST fileshare mounted via NFS readahead: 4096, nconnect: 16, rdma: true. There is no persisted storage, loading directly into mounted RAM disk
Is there anything else that would need configuring?
I also tried with INSTANTTENSOR_USE_CUFILE. I am using the latest lightly VLLM cu13 x86 image with instanttensor installed into it
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels