[Bug]: ACL Graph 模式下 ATB CustomPagedAttention context_lens 冻结导致 decode 全零输出

### Your environment

quay.io/jd_xllm/xllm-ai:xllm-dev-a3-arm-20260306

### 🐛 Describe the bug

## Bug 描述

ACL Graph 模式下，当首次请求 prefill token 数 ≤ block_size 时，decode 约 800+ 步后 hidden states 输出全零，且后续不可恢复。
此问题在 MiniMax-M2.7 模型上发现，但与具体模型无关，是 `npu_torch` Attention 层走 ATB `CustomPagedAttention` 路径时的机制问题。

## 根因分析

### 机制问题

ATB `CustomPagedAttention` 在 `GRAPH_LAUNCH_MODE` 下的执行流程：

1. **Setup 阶段**（graph capture 时）：`ModifyKernelGraph()` 从 `hostData` 读取 `context_lens`，将 `kvSeqLen` 写入 `OpParam::PagedAttention`，烘焙进 kernel graph 的 `opDesc`
2. **Execute 阶段**（graph replay 时）：直接重放 captured graph，**不再调用 `ModifyKernelGraph()`**，不重新读取 `context_lens`

这意味着 **ATB kernel 内部使用的 `kvSeqLen` 在 capture 后就固定了**，无法通过更新 device tensor 或 tiling 来改变。

### xLLM 当前的处理方式

xLLM 在 `plan_paged_attention_tiling()` 中每步更新 tiling device tensor（`MAX_KVSEQLEN` 等参数正确递增），这部分是正确的。但 tiling 只是 kernel 读取的数据之一，captured graph 内部 `opDesc` 中冻结的 `kvSeqLen` 是另一部分，无法通过更新 tiling 修复。

### 数据流

```
Capture 时 (kv_seq_len=95):
  hostData=[95] → ModifyKernelGraph() → OpParam.kvSeqLen=[95] → 烘焙进 graph
  tiling: computed from kvSeqLen=[95], 写入 device tensor

Replay 时 (kv_seq_len=897):
  tiling device tensor: 每步更新, MAX_KVSEQLEN=897 ✓
  graph 内部 OpParam.kvSeqLen: 仍为 [95] ✗  ← 根因
  kernel 使用的 context_lens: 来自 OpParam, 仍为 95 ✗
```
### 影响范围

此问题与具体模型无关，是 xLLM ACL graph executor + ATB CustomPagedAttention 的机制问题。任何模型在以下条件都会触发：
- 使用 ACL graph 模式
- 首次请求 prefill token 数 ≤ block_size (通常 128)
- decode 步数足够多（KV cache block 数增长到约 8 倍 capture 时的值）

## 复现步骤

1. 启动 xLLM 服务（ACL graph 模式）
2. 发送一个短 prefill 请求（≤128 token）
3. 等待 decode 约 800+ 步
4. 观察输出变为全零

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: ACL Graph 模式下 ATB CustomPagedAttention context_lens 冻结导致 decode 全零输出 #1466

Your environment

🐛 Describe the bug

Bug 描述

根因分析

机制问题

xLLM 当前的处理方式

数据流

影响范围

复现步骤

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug]: ACL Graph 模式下 ATB CustomPagedAttention context_lens 冻结导致 decode 全零输出 #1466

Description

Your environment

🐛 Describe the bug

Bug 描述

根因分析

机制问题

xLLM 当前的处理方式

数据流

影响范围

复现步骤

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions