Skip to content

[Bug]: GLM-5-w8a8 fails to launch in 910C #967

@Pastens

Description

@Pastens

Your environment

  • Hardware: 910C with ARM
  • xLLM version: preview/glm5
  • startup parameters:
    • --max_memory_utilization=0.85
      --max_tokens_per_batch=8192
      --max_seqs_per_batch=16
      --block_size=128
      --enable_prefix_cache=true
      --enable_chunked_prefill=true
      --communication_backend="hccl"
      --enable_schedule_overlap=true
      --enable_graph=true
      --enable_graph_no_padding=true
      --enable_mla=true
      --draft_model=$DRAFT_MODEL_PATH
      --draft_devices="npu:$DEVICE"
      --num_speculative_tokens=1
      --ep_size=8
      --dp_size=1

🐛 Describe the bug

  1. Log from rank-0 showed that word_embedding_layer execute plan fail:
I20260302 12:21:10.077145 521062 llm_engine.cpp:389] Initializing v cache with shape: [275 128 1 64]
I20260302 12:21:10.077220 521062 llm_engine.cpp:391] Initializing indexer cache with shape: [275 128 1 128]
I20260302 12:21:10.078318 521062 profile_manager.cpp:63] Starting ACL Graph/CUDA Graph warmup.
I20260302 12:21:10.078365 521062 profile_manager.cpp:771] Starting ACL Graph/CUDA Graph warmup with prefill and decode requests...
I20260302 12:21:10.078394 521062 profile_manager.cpp:809] Warming up prefill request: tokens=8192
mki_log mkdir /root/ascend/log/atb
E20260302 12:21:10.300601 525515 npu_base_layer.cpp:124] word_embedding_layer execute plan fail, error code: 28
terminate called after throwing an instance of 'std::runtime_error'
terminate called recursively
  what():  The Inner error is reported as above. The process exits for this inner error, and the current working operator name is word_embedding_layer0.
Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, please set the environment variable ASCEND_LAUNCH_BLOCKING=1.
Note: ASCEND_LAUNCH_BLOCKING=1 will force ops to run in synchronous mode, resulting in performance degradation. Please unset ASCEND_LAUNCH_BLOCKING in time after debugging.
[ERROR] 2026-03-02-12:21:10 (PID:521062, Device:0, RankID:-1) ERR00100 PTA call acl api failed.

[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
/usr/lib64/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 30 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
  1. ATB log show that the HcclGetRootInfo fail:
[2026-03-02 12:21:10.125760] [error] [525604] [hccl_runner.cpp:178] AllGatherHcclRunner:0 HcclGetRootInfo fail, error:7, rank:0
[2026-03-02 12:21:10.127881] [error] [525604] [comm_pool.h:42] CommPool commCreateFunc fail
[2026-03-02 12:21:10.127889] [error] [525604] [hccl_runner.cpp:81] AllGatherHcclRunner:0 get hccl comm fail by rank:0
[2026-03-02 12:21:10.300542] [error] [525515] [all_gather_hccl_runner.cpp:39] hcclComm is null, rank: 0
[2026-03-02 12:21:10.300575] [error] [525515] [runner.cpp:133] AllGatherHcclRunner_0_1:1 Execute Failed. st: 28
[2026-03-02 12:21:10.300583] [error] [525515] [graph_runner.cpp:972] WordEmbeddingRunner_0:0  node[1] execute fail, runner name:AllGatherHcclRunner
[2026-03-02 12:21:10.300588] [error] [525515] [runner.cpp:133] WordEmbeddingRunner_0:1 Execute Failed. st: 28
[2026-03-02 12:21:10.300593] [error] [525515] [operation_base.cpp:1018] WordEmbedding_0 execute WordEmbeddingRunner fail
[2026-03-02 12:21:10.300596] [error] [525515] [operation_base.cpp:1095] WordEmbedding_0 Launch fail, error code: 28
  1. Always happens.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions