Skip to content

Eval bug: Vision not working for Gemma4 when Eagle3 is Enabled #24816

Description

@ConnoisseurProtege

Name and Version

llama-server --version
version: 9728 (fabde3b)
built with MSVC 19.51.36246.0 for x64

Operating systems

Windows

GGML backends

CUDA

Hardware

Ryzen 9950X3D + 4080 Super

Models

Gemma4-31B-it Q8_0

Problem description & steps to reproduce

When I try to run Gemma4 31B with Eagle3 with Vision, llama-server just crashes. With MTP it works fine and without vision it works fine. But whenever I attach an image, it crashes the server.

First Bad Commit

Unsure

Relevant log output

Logs
1.16.954.923 I srv   operator (): Chat format: peg-gemma4
1.29.034.063 E init: the tokens of sequence 0 in the input batch have inconsistent sequence positions:
 - the last position stored in the memory module of the context (i.e. the KV cache) for sequence 0 is X = 165
 - the tokens for sequence 0 in the input batch have a starting position of Y = 2390
 it is required that the sequence positions remain consecutive: Y = X + 1
1.29.034.067 E decode: failed to initialize batch
1.29.034.068 E llama_decode: failed to decode, ret = -1
1.29.034.069 E process: llama_decode(ctx_dft) failed rc=-1 (n_tokens=46, ubatch_pos[0]=2390)
1.29.034.070 E srv  update_slots: failed to process speculative batch
1.29.034.079 I slot      release: id  0 | task 0 | stop processing: n_tokens = 2437, truncated = 0
1.29.036.893 I slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.998 (> 0.100 thold), f_keep = 1.000
1.29.036.956 I slot launch_slot_: id  0 | task 5 | processing task, is_child = 0
1.29.204.519 I slot create_check: id  0 | task 5 | created context checkpoint 2 of 32 (pos_min = 0, pos_max = 2436, n_tokens = 2437, size = 802.609 MiB)
1.29.831.120 E ~\llama.cpp\tools\server\server-context.cpp:3304: fatal error - please provide logs and repro in https://github.com/ggml-org/llama.cpp/pull/20277

init: the tokens of sequence 0 in the input batch have inconsistent sequence positions:
 - the last position stored in the memory module of the context (i.e. the KV cache) for sequence 0 is X = 165
 - the tokens for sequence 0 in the input batch have a starting position of Y = 2436
 it is required that the sequence positions remain consecutive: Y = X + 1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions