Eval bug: Vision not working for Gemma4 when Eagle3 is Enabled

### Name and Version

llama-server --version
version: 9728 (fabde3bf5)
built with MSVC 19.51.36246.0 for x64

### Operating systems

Windows

### GGML backends

CUDA

### Hardware

Ryzen 9950X3D + 4080 Super

### Models

Gemma4-31B-it Q8_0

### Problem description & steps to reproduce

When I try to run Gemma4 31B with Eagle3 with Vision, llama-server just crashes. With MTP it works fine and without vision it works fine. But whenever I attach an image, it crashes the server. 

### First Bad Commit

Unsure

### Relevant log output

<details>
<summary>Logs</summary>


```console
1.16.954.923 I srv   operator (): Chat format: peg-gemma4
1.29.034.063 E init: the tokens of sequence 0 in the input batch have inconsistent sequence positions:
 - the last position stored in the memory module of the context (i.e. the KV cache) for sequence 0 is X = 165
 - the tokens for sequence 0 in the input batch have a starting position of Y = 2390
 it is required that the sequence positions remain consecutive: Y = X + 1
1.29.034.067 E decode: failed to initialize batch
1.29.034.068 E llama_decode: failed to decode, ret = -1
1.29.034.069 E process: llama_decode(ctx_dft) failed rc=-1 (n_tokens=46, ubatch_pos[0]=2390)
1.29.034.070 E srv  update_slots: failed to process speculative batch
1.29.034.079 I slot      release: id  0 | task 0 | stop processing: n_tokens = 2437, truncated = 0
1.29.036.893 I slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.998 (> 0.100 thold), f_keep = 1.000
1.29.036.956 I slot launch_slot_: id  0 | task 5 | processing task, is_child = 0
1.29.204.519 I slot create_check: id  0 | task 5 | created context checkpoint 2 of 32 (pos_min = 0, pos_max = 2436, n_tokens = 2437, size = 802.609 MiB)
1.29.831.120 E ~\llama.cpp\tools\server\server-context.cpp:3304: fatal error - please provide logs and repro in https://github.com/ggml-org/llama.cpp/pull/20277

init: the tokens of sequence 0 in the input batch have inconsistent sequence positions:
 - the last position stored in the memory module of the context (i.e. the KV cache) for sequence 0 is X = 165
 - the tokens for sequence 0 in the input batch have a starting position of Y = 2436
 it is required that the sequence positions remain consecutive: Y = X + 1
```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: Vision not working for Gemma4 when Eagle3 is Enabled #24816

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Eval bug: Vision not working for Gemma4 when Eagle3 is Enabled #24816

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions