Skip to content

RuntimeError: dflash daemon exited before weights finished loading. Check the daemon's stderr. #233

@DemonODG

Description

@DemonODG

First, I tried splitting the target model and the draft onto different video cards. I got an error:

CUDA_VISIBLE_DEVICES=1,0 DFLASH_TARGET_GPU=0 DFLASH_DRAFT_GPU=1 DFLASH_FP_USE_BSA=1 DFLASH_FP_ALPHA=0.85 python tests/bench_niah_cpp.py --bin ../dflash/build/test_dflash --target ../../ModelsIA/Qwen/Qwen3.6-27B-Q4_K_M.gguf --draft-spec ../../ModelsIA/Qwen/draft/model.safetensors --drafter-gguf ../../ModelsIA/Qwen/drafter/Qwen3-0.6B-BF16.gguf --cases /tmp/niah_128k.jsonl --keep-ratio 0.05 --n-gen 256

[init] spawning daemon: ../dflash/build/test_dflash
[cfg] seq_verify=0 fast_rollback=1 ddtree=1 budget=16 temp=1.00 chain_seed=1 fa_window=0 draft_swa=0 draft_ctx_max=4096 draft_feature_mirror=0 peer_access=0 target_gpu=0 draft_gpu=1
[test_dflash] arch=qwen35 daemon -> dispatching to run_qwen35_daemon (max_ctx=16384 stream_fd=5)
ggml_cuda_init: found 2 CUDA devices (Total VRAM: 36857 MiB):
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24575 MiB
Device 1: NVIDIA GeForce RTX 4080 Laptop GPU, compute capability 8.9, VMM: yes, VRAM: 12281 MiB
[loader] eos_id=248046 eos_chat_id=-1
[target] target loaded: layers [0,64) output=1, 850 tensors on GPU 14.99 GiB, tok_embd 682 MiB CPU-only (q4_K)
draft load: safetensors: 'layers.0.self_attn.k_norm.weight' shape[0]=128 expected 256
Traceback (most recent call last):
File "/home/dimanodg/myproject/lucebox-hub/pflash/tests/bench_niah_cpp.py", line 184, in
main()
File "/home/dimanodg/myproject/lucebox-hub/pflash/tests/bench_niah_cpp.py", line 123, in main
dflash = DflashClient(
^^^^^^^^^^^^^
File "/home/dimanodg/myproject/lucebox-hub/pflash/pflash/dflash_client.py", line 91, in init
self._wait_until_loaded(timeout=boot_timeout_s, vram_mib=boot_vram_mib)
File "/home/dimanodg/myproject/lucebox-hub/pflash/pflash/dflash_client.py", line 101, in _wait_until_loaded
raise RuntimeError(
RuntimeError: dflash daemon exited before weights finished loading. Check the daemon's stderr.

Then I used only RTX 3090 and I get the same error:

CUDA_VISIBLE_DEVICES=1 DFLASH_FP_USE_BSA=1 DFLASH_FP_ALPHA=0.85 python tests/bench_niah_cpp.py --bin ../dflash/build/test_dflash --target ../../ModelsIA/Qwen
/Qwen3.6-27B-Q4_K_M.gguf --draft-spec ../../ModelsIA/Qwen/draft/model.safetensors --drafter-gguf ../../ModelsIA/Qw
en/drafter/Qwen3-0.6B-BF16.gguf --cases /tmp/niah_128k.jsonl --keep-ratio 0.05 --n-gen 256

[init] spawning daemon: ../dflash/build/test_dflash
[cfg] seq_verify=0 fast_rollback=1 ddtree=1 budget=16 temp=1.00 chain_seed=1 fa_window=0 draft_swa=0 draft_ctx_max=4096 draft_feature_mirror=0 peer_access=0 target_gpu=0 draft_gpu=0
[test_dflash] arch=qwen35 daemon -> dispatching to run_qwen35_daemon (max_ctx=16384 stream_fd=5)
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 24575 MiB):
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24575 MiB
[loader] eos_id=248046 eos_chat_id=-1
[target] target loaded: layers [0,64) output=1, 850 tensors on GPU 14.99 GiB, tok_embd 682 MiB CPU-only (q4_K)
draft load: safetensors: 'layers.0.self_attn.k_norm.weight' shape[0]=128 expected 256
Traceback (most recent call last):
File "/home/dimanodg/myproject/lucebox-hub/pflash/tests/bench_niah_cpp.py", line 184, in
main()
File "/home/dimanodg/myproject/lucebox-hub/pflash/tests/bench_niah_cpp.py", line 123, in main
dflash = DflashClient(
^^^^^^^^^^^^^
File "/home/dimanodg/myproject/lucebox-hub/pflash/pflash/dflash_client.py", line 91, in init
self._wait_until_loaded(timeout=boot_timeout_s, vram_mib=boot_vram_mib)
File "/home/dimanodg/myproject/lucebox-hub/pflash/pflash/dflash_client.py", line 101, in _wait_until_loaded
raise RuntimeError(
RuntimeError: dflash daemon exited before weights finished loading. Check the daemon's stderr.

nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Sep__8_19:17:24_PDT_2023
Cuda compilation tools, release 12.3, V12.3.52
Build cuda_12.3.r12.3/compiler.33281558_0

Please tell me the possible reasons for this error and how to fix it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingserver

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions