fix(spacemit): fix MiniCPM-V SMT multimodal inference by oscar1229 · Pull Request #15 · spacemit-com/llama.cpp

oscar1229 · 2026-07-03T06:50:24Z

Fixes three independent bugs that prevented MiniCPM-V from running via the SMT media backend with multi-threaded warmup and multi-turn image conversations.

fix(spacemit): fix IME paired-lane GEMM threadpool deadlock

The IME GEMM kernels (forward_mul_mat and the mul_mat_id MoE path)
rendezvous thread pairs (2k, 2k+1) on a spine_barrier built for two
participants, so both lanes must call spine_barrier_wait() the same
number of times. The old per-thread loop could iterate a different
number of times per lane when gemm_n was not a multiple of
NB_COLS*nth, and the trailing even thread on odd nth had no partner,
so warmup hung with -t 8. Drive the loop from a pair-aligned base
with a per-lane offset (both lanes always iterate equally; an
out-of-range lane skips the GEMM but still hits the barrier) and
guard the barrier with has_pair so a partnerless thread never waits.
server: force full re-prefill for multimodal FULL-only KV cache reuse

MiniCPM-V runs on the qwen35 hybrid (SSM + periodic full-attention)
backend whose KV memory only supports full sequence removal. On a
multi-turn request, partial prompt-cache reuse would either restore a
context checkpoint (resurrecting a KV state inconsistent with the
external smt/ONNX vision embeddings) or call partial memory_seq_rm on
FULL-only memory, which returns false and triggers GGML_ABORT. When
the context is multimodal and the reused prefix is partial, force a
full re-prefill (pos_next = 0, n_past = 0) before the checkpoint /
seq_rm path. Pure-append turns and non-multimodal contexts are
unaffected.
feat(mtmd): add MiniCPM-V SMT vision preprocessing

The MiniCPM-V SMT vision ONNX export does not normalize pixels
internally. Detect minicpmv / minicpm_v / minicpm-v architectures and
route them through rgb_u8_to_chw_f32_with_config, which reads
rescale_factor / image_mean / image_std from config.json's
vision_preprocess block and emits a CHW float32 tensor. Target
defaults to 448x448, overridable via vision_model.input_width/height.

Overview

Additional information

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure:

Fixes three independent bugs that prevented MiniCPM-V from running via the SMT media backend with multi-threaded warmup and multi-turn image conversations. * fix(spacemit): fix IME paired-lane GEMM threadpool deadlock The IME GEMM kernels (forward_mul_mat and the mul_mat_id MoE path) rendezvous thread pairs (2k, 2k+1) on a spine_barrier built for two participants, so both lanes must call spine_barrier_wait() the same number of times. The old per-thread loop could iterate a different number of times per lane when gemm_n was not a multiple of NB_COLS*nth, and the trailing even thread on odd nth had no partner, so warmup hung with -t 8. Drive the loop from a pair-aligned base with a per-lane offset (both lanes always iterate equally; an out-of-range lane skips the GEMM but still hits the barrier) and guard the barrier with has_pair so a partnerless thread never waits. * server: force full re-prefill for multimodal FULL-only KV cache reuse MiniCPM-V runs on the qwen35 hybrid (SSM + periodic full-attention) backend whose KV memory only supports full sequence removal. On a multi-turn request, partial prompt-cache reuse would either restore a context checkpoint (resurrecting a KV state inconsistent with the external smt/ONNX vision embeddings) or call partial memory_seq_rm on FULL-only memory, which returns false and triggers GGML_ABORT. When the context is multimodal and the reused prefix is partial, force a full re-prefill (pos_next = 0, n_past = 0) before the checkpoint / seq_rm path. Pure-append turns and non-multimodal contexts are unaffected. * feat(mtmd): add MiniCPM-V SMT vision preprocessing The MiniCPM-V SMT vision ONNX export does not normalize pixels internally. Detect minicpmv / minicpm_v / minicpm-v architectures and route them through rgb_u8_to_chw_f32_with_config, which reads rescale_factor / image_mean / image_std from config.json's vision_preprocess block and emits a CHW float32 tensor. Target defaults to 448x448, overridable via vision_model.input_width/height.

oscar1229 requested a review from alex-spacemit as a code owner July 3, 2026 06:50

github-actions Bot added server ggml mtmd labels Jul 3, 2026

alex-spacemit requested a review from co-seven July 3, 2026 06:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(spacemit): fix MiniCPM-V SMT multimodal inference#15

fix(spacemit): fix MiniCPM-V SMT multimodal inference#15
oscar1229 wants to merge 1 commit into
spacemit-com:spacemit-mtmdfrom
oscar1229:spacemit-mtmd

oscar1229 commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

oscar1229 commented Jul 3, 2026

Overview

Additional information

Requirements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant