Skip to content

Unable to match Runai Distributed Bandwidth on VLLM #4

@bbartels

Description

@bbartels

We have 400gbit/s networked VAST storage and generally see ~1 minute weight loading times of kimi-k2.5 with Runai streamer and the following settings:

RUNAI_STREAMER_DIST: 1
RUNAI_STREAMER_CHUNK_BYTESIZE: 4194304
--model-loader-extra-config={{\"concurrency\": 140, \"distributed\": true}}

I tried instanttensor on VLLM and noticed it takes around ~2:30 min

I tested both with Kimi-k2.5, 8xH200, on a networked VAST fileshare mounted via NFS readahead: 4096, nconnect: 16, rdma: true. There is no persisted storage, loading directly into mounted RAM disk

Is there anything else that would need configuring?
I also tried with INSTANTTENSOR_USE_CUFILE. I am using the latest lightly VLLM cu13 x86 image with instanttensor installed into it

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions