Skip to content

Model Engine with restore KV#6

Open
Snider wants to merge 22 commits into
mainfrom
dev
Open

Model Engine with restore KV#6
Snider wants to merge 22 commits into
mainfrom
dev

Conversation

@Snider
Copy link
Copy Markdown
Contributor

@Snider Snider commented May 20, 2026

Summary by CodeRabbit

  • New Features

    • Added OpenAI, Anthropic, and Ollama API compatibility layers for model inference.
    • Added model capability discovery and reporting system.
    • Added agent memory state persistence and recovery (wake/sleep lifecycle).
    • Added request scheduling with cancellation and backpressure control.
    • Added output parsing for structured reasoning and tool-call extraction.
    • Added GGUF model metadata reading and discovery.
    • Added benchmarking and dataset evaluation framework.
    • Added quantisation support (codebook VQ, JANG/JANGTQ).
  • Documentation

    • Comprehensive documentation for all packages and APIs.

Review Change Stack

Snider and others added 22 commits May 1, 2026 08:34
Reference: core/api/LICENCE.

Co-Authored-By: Cladius Maximus <cladius@lethean.io>
…block)

- git submodule update on external/* to current dev tips
- go.work paths fixed for Phase 1 /go/ subtree layout where stale
- go.work go-version bumped 1.26.0 → 1.26.2 to match submodule floor

Workspace-mode build (`go build ./...`) is the verification path. Some
repos may surface transitive dep issues (api/go.sum checksum drift, etc.)
which are separate cascade tickets — not blocking this metadata refresh.

Co-Authored-By: Cladius Maximus <cladius@lethean.io>
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Virgil <virgil@lethean.io>
…336)

go-inference gets the canonical service-registration shape per #1336.
Naming divergence from canon required: the package already exposes
`Register(b Backend)` as the well-known init-time backend-registration
pattern (every backend init() calls inference.Register(metal.NewBackend())).
Renaming would break every backend.

So the canonical Core registration is `RegisterCore(c)` here; existing
`Register(b Backend)` preserved untouched. Naming-divergence documented
inline in service.go.

  inference.NewService(inference.Options{})
                              → factory for core.WithService
  inference.RegisterCore(c)   → defaults shorthand
  inference.Register(b)       → unchanged: backend self-registration

v1 Options is empty since package behaviour is driven by the global
Backend registry which is independently managed via init().

Smoke verified:
- GOWORK=off go vet ./... — clean
- TestNewService_RegistersInferenceService — PASS
- TestRegisterCore_Imperative — PASS

Co-Authored-By: Virgil <virgil@lethean.io>
Permanent service_test.go for canon shape (commit bbdaf88). Two cases:
NewService(empty) round-trip + RegisterCore imperative shorthand. Note
the RegisterCore name (not Register) preserves the existing
`func Register(b Backend)` init-time backend self-registration pattern.

Coverage sweep (#1387): 8th of 22.

Co-Authored-By: Virgil <virgil@lethean.io>
Promote three areas to public packages alongside per-file documentation:

- state/ — Wake/Sleep/Fork lifecycle, identity DTOs (Model/Tokenizer/
  Adapter/Runtime/Sampler), Store/Resolver/Writer interfaces, InMemoryStore
  reference impl, filestore/ append-only backend. Identity types hoisted out
  of inference root; aliases preserved in identity.go for stable imports.
- openai/responses.go, services.go — Responses API DTOs + embeddings, rerank,
  capabilities, cache, cancel handlers.
- anthropic/, ollama/ — wire-compat DTO packages.
- contracts.go promoted from internal to public: SchedulerModel,
  CancellableModel, CacheService, EmbeddingModel, RerankModel, ReasoningParser,
  ToolParser, ModelPackInspector + AgentMemory* aliases.
- capability.go: 41 stable CapabilityID values, AlgorithmProfile,
  RuntimeMemoryLimits, CapabilityReporter.

docs/ pass adds per-file documentation under docs/{package}/{file}.md so
future readers can plan against shapes without reading code. 24 new docs
covering state/ + openai/ + anthropic/ + ollama/ + inference/ root files
plus package READMEs and a top-level index.

Co-Authored-By: Virgil <virgil@lethean.io>
Lifts model-family reasoning + tool-call parsing out of go-mlx so every
driver (mlx, rocm, cuda, tpu, future) inherits the same logic.

Surface:
- Hint{Architecture, AdapterName} — minimum selector input from drivers
- Mode (Show/Hide/Capture) + Config + Chunk + Result — thinking-channel DTOs
- OutputParser interface + Registry + ForHint(hint) — registry surface
- NewProcessor(cfg, hint) + Filter(text, cfg, hint) — thinking-mode processor
- Family(hint) + NormaliseKey(value) — selector helpers

Built-in parsers: qwen, gemma, deepseek-r1, gpt-oss, minimax, mistral,
kimi, glm, hermes, granite, generic fallback. Marker sets match the
prior go-mlx implementation byte-for-byte.

Driver side: a hint conversion (parser.Hint{Architecture, AdapterName}
from each driver's local model info) and any tokenizer-using wrappers
stay in the driver — FilterThinkingTokens in go-mlx is one such shim.

Tests cover: family lookup across 11 architectures, reasoning parsing
for qwen/gemma/gpt-oss markers, tool parsing tagged + JSON fallback +
bad payloads, custom-parser registration, nil-receiver fallbacks,
thinking-mode hide + show + capture + processor partial-flush.

Co-Authored-By: Virgil <virgil@lethean.io>
Adds the missing probe vocabulary for request-scheduler observability:

- ProbeEventScheduler — kind constant for queue/scheduler events
- ProbePhaseQueue — phase constant for queue-side timing
- ProbeScheduler — request-id, event, queue depth, queue/first-token/
  total latency in millis, cancelled flag
- Scheduler *ProbeScheduler field on ProbeEvent

Drivers (go-mlx scheduler.go and downstream peers) emit through this
shape so probe consumers branch on Kind/Phase and unwrap the typed
payload uniformly.

Co-Authored-By: Virgil <virgil@lethean.io>
Splits the JANG/JANGTQ + VQ codebook quant metadata out of go-mlx so
every driver (mlx, rocm, cuda, tpu, future) inherits them.

quant/jang/
- Info, Capabilities, TensorRole (+ consts), PackedProfile,
  PackedTensorDescriptor, BitOrderLSB0, EncodingAffine
- ReadConfig(path), ParseConfig(data), ProfileBits(name),
  BuildPackedProfile, ClonePackedProfile, NewPackedTensorDescriptor,
  ValidatePackedTensor, DequantizePackedTensor, PackQuantizedValues
- Reference CPU dequant + pack for parity tests vs native kernels.
- Driver side: HF metadata inference helpers
  (inferJANGQuantizationFromHF / hfJANGGroupSize) stay in go-mlx as
  a thin file that imports this package — they depend on
  mlx.HFModelMetadata which itself isn't lifted yet.

quant/codebook/
- Profile, TensorDescriptor, Type ("codebook"), FormatVQ ("vq")
- ParseProfile(data), ReadProfile(path), NewTensorDescriptor,
  ValidateProfile, ValidateTensorDescriptor, ValidateTensorPayload,
  MatVec(desc, input, codes, table, bias), CloneProfile

Symbol-namespace rename — package name takes the disambiguation slot:

  JANGQuantizationInfo               → jang.Info
  JANGCapabilities                   → jang.Capabilities
  JANGPackedQuantizationProfile      → jang.PackedProfile
  JANGPackedTensorDescriptor         → jang.PackedTensorDescriptor
  NewJANGPackedTensorDescriptor      → jang.NewPackedTensorDescriptor
  BuildJANGPackedQuantizationProfile → jang.BuildPackedProfile
  CodebookQuantizationProfile        → codebook.Profile
  CodebookTensorDescriptor           → codebook.TensorDescriptor
  ParseCodebookQuantizationProfile   → codebook.ParseProfile
  CodebookVQMatVec                   → codebook.MatVec
  ...

Tests ported — file-aware Test<File>_<Symbol>_<Variant> shape:
parity round-trip, attention-wide-bits, unsupported-bits diagnostic,
packed-length validation, profile build, descriptor validate-and-
matvec, unaligned-shape rejection, out-of-range code diagnostic,
JSON config parse. All green.

Companion lift: model/minimax/m2 + moe expert_residency policy land
in follow-up commits — m2 has safetensorIndex couplings, expert_
residency needs a budget-bytes refactor away from Apple-class enum.

Co-Authored-By: Virgil <virgil@lethean.io>
Add eval package with interface-driven design: Sample/Batch/BatchConfig
are opaque (any), Dataset is a Next-iterator interface, and Runner is a
struct of callbacks the driver fills in (Info, LoadAdapter, BuildBatches,
EvaluateBatch, BatchTokens, SampleText).

eval.RunDataset orchestrates: sample collection, batch building (via
runner), per-batch evaluation, metrics aggregation (loss + perplexity),
and default + user-supplied quality probes.

AdapterInfo is defined locally rather than imported from go-mlx/lora —
keeps eval driver-neutral so go-rocm/go-cuda/etc. can also adopt without
pulling go-mlx as a dependency.

ResponseCoverageProbe is provided as an exported probe so driver
wrappers can attach it without eval needing to know sample field shape.

Co-Authored-By: Virgil <virgil@lethean.io>
Verb-shaped Runner: driver provides Generate + per-section Bench*
callbacks (BenchPromptCache, BenchMemvidKVBlockWarm, BenchKVRestore,
BenchStateBundle, BenchProbeOverhead, BenchSpeculativeDecode,
BenchPromptLookupDecode). bench.Run orchestrates Info collection +
generation timing + dispatches each enabled callback + assembles the
Report.

Report types are driver-neutral data: GenerationSummary/Sample,
PromptCacheReport, MemvidKVBlockWarmReport, LatencyReport,
StateBundleReport, ProbeReport (Events []any for opaque driver-event
vocabularies), DecodeOptimisationReport, QualityReport.

GenerationMetrics is a flat mirror of the driver's per-call metrics
(PrefillTokensPerSec, DecodeTokensPerSec, PeakMemoryBytes, etc.) — same
fields as go-mlx's Metrics struct so drivers populate it directly.

PopulateMemvidKVBlockWarmBench is exposed so drivers can hand off the
cross-cutting derived fields (Speedup, BreakEvenQuestions) once their
capture/restore measurements are in.

Co-Authored-By: Virgil <virgil@lethean.io>
Covers Run callback dispatch (verb-callbacks fire iff IncludeX flag is set
and the callback is non-nil), Generate-error propagation, nil-context
fallback, GenerationSummary aggregation (rates averaged, peaks maxed,
total-duration fallback to elapsed), default + zero-config normalisation
with independent slice clones, PopulateMemvidKVBlockWarmBench derived
fields (speedup, saved-per-question, break-even), AdapterInfo.IsEmpty,
GenerateOptions probe-sink passthrough + StopTokens clone, NonZeroDuration
floor.

Backfills the coverage gap left by deleting fast_eval_test.go,
fast_eval_example_test.go, and workload_bench_test.go from go-mlx — those
exercised the old raw-callback Runner shape; the verb-callback redesign
needs tests against the bench package directly.

Co-Authored-By: Virgil <virgil@lethean.io>
Lifts the decode-optimisation algorithm from go-mlx (decode_optimisation.go)
into a self-contained driver-neutral package. Symbols rename per the
folder-taxonomy rule that packages don't repeat their own prefix:

  RunSpeculativeDecode      → decode.Speculative
  RunPromptLookupDecode     → decode.PromptLookup
  DecodeOptimisationResult  → decode.Result
  DecodeOptimisationMetrics → decode.Metrics
  SpeculativeDecodeConfig   → decode.SpeculativeConfig
  PromptLookupDecodeConfig  → decode.PromptLookupConfig
  DecodeGenerateFunc        → decode.GenerateFunc
  DecodeGeneration          → decode.Generation
  DecodeModeSpeculative     → decode.ModeSpeculative
  DecodeModePromptLookup    → decode.ModePromptLookup

Token + GenerateConfig + Generation become decode-package types with a
minimal ID/Text/Value surface — drivers convert their native token type
at the boundary (same pattern as bench.AdapterInfo).

Coverage: ports the original three tests + adds error-propagation +
nil-context + token-equality + clone-independence + max-tokens-clamp +
draft-tokens-clamp + utility checks. Sixteen tests, five examples, all
green.

Co-Authored-By: Virgil <virgil@lethean.io>
…odel

Lifts the package-first request scheduler from go-mlx into a
self-contained driver-neutral package. Symbols rename per the
folder-taxonomy rule:

  ScheduledModel       → scheduler.Model
  SchedulerConfig      → scheduler.Config
  NewScheduledModel    → scheduler.New
  scheduledJob         → job (private)
  emitSchedulerProbe   → (Model).emitProbe (private method)
  scheduledGenerateOptions → generateOptions (private)
  cloneSchedulerLabels → cloneLabels (private)

scheduler.Model wraps an inference.TextModel with bounded queueing,
cancellation, streaming backpressure, and ProbeEventScheduler probe
emission. Worker pool sized by Config.MaxConcurrent; queue bounded by
MaxQueue; per-request stream buffer set by StreamBuffer.

Coverage: queue + latency probe, full-queue rejection, cancellation,
Generate/Chat/Classify/BatchGenerate delegation, nil-scheduler defence
paths, fallback cancel via inference.CancellableModel, Err propagation,
generateOptions sampler conversion, cloneLabels defensive copy, millis
helpers. Six tests, ten examples, all green.

Co-Authored-By: Virgil <virgil@lethean.io>
Add project seed wake/continuation helpers, local tuning DTOs, and split-inference planning contracts for go-mlx agent workflows.

Record first-token benchmark timing and Gemma channel thought markers so downstream runners can preserve long-context measurements and strip thinking history correctly.

Co-Authored-By: Virgil <virgil@lethean.io>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

📝 Walkthrough

Walkthrough

Massive PR adding a complete backend‑neutral inference contract with provider adapters, parser/streaming, scheduler, state/storage, evaluation/benchmarking/decoding, quantisation metadata tools, discovery/GGUF parsing, service registration, extensive docs, and tests. Includes EUPL‑1.2 licence and a submodule bump.

Changes

Inference Core and Services

Layer / File(s) Summary
Core contracts, capabilities, datasets, probe, split, tuning
go/*.go
Adds portable interfaces/DTOs for inference features, capability reporting/inference, datasets/eval/training, probe events, split/tuning.
Identity re‑exports and service wiring
go/identity*.go, go/service*.go
Re‑exports state identities; adds sampler/generate converters; registers inference as a Core service.

Provider Wire Adapters

Layer / File(s) Summary
OpenAI chat adapter and handler
go/openai/openai*.go
Implements Chat Completions request/response, resolvers, HTTP handler (SSE/non‑SSE), stop handling, thinking extractor.
OpenAI responses and services
go/openai/responses*.go, go/openai/services*.go
Adds Responses API DTOs/builders and HTTP handlers for embeddings, rerank, capabilities, cache, cancel.
Anthropic and Ollama adapters
go/anthropic/*.go, go/ollama/*.go
Adds Messages/Chat/Generate DTOs, translators, response builders, with tests.

Parsing and Thinking

Layer / File(s) Summary
Registry and reasoning parse
go/parser/*registry*.go, go/parser/reasoning*.go
Parser registry with family selection; parses reasoning markers.
Tools parsing and thinking processor
go/parser/tools*.go, go/parser/thinking*.go
Parses tool calls (tagged/JSON) and streams reasoning capture with modes.

Scheduler

Layer / File(s) Summary
Scheduler model
go/scheduler/*.go
Wrapper with bounded queueing, cancellation, streaming, probes, delegation; tests and examples.

State and Filestore

Layer / File(s) Summary
State contracts and in‑memory store
go/state/store*.go, go/state/memory.go
Defines storage contracts; implements in‑memory store; tests.
Filestore, agent memory, project seed
go/state/filestore/*.go, go/state/agent_memory.go, go/state/project_seed*.go
Append‑only file store with legacy support; agent memory lifecycle; project‑seed helpers; tests.

Evaluation, Decode, Bench

Layer / File(s) Summary
Dataset and evaluation
go/dataset*.go, go/eval/eval.go
Dataset/eval abstractions and engine with quality probes; tests/examples.
Decode and benchmarking
go/decode/*.go, go/bench/*.go
Speculative/prompt‑lookup decode acceptance; generic benchmark runner; tests/examples.

Quantisation

Layer / File(s) Summary
Codebook VQ
go/quant/codebook/*.go
Parses VQ profiles, validates payloads, matvec compute; tests.
JANG/JANGTQ
go/quant/jang/*.go
Packed tensor metadata, (de)quantisation, roles/bits; tests.

Discovery and GGUF

Layer / File(s) Summary
Discovery and GGUF reader
go/discover.go, go/gguf*.go
Extends DiscoveredModel; parses GGUF header/metadata; tests.

Docs and Licence

Layer / File(s) Summary
Docs and licence
LICENCE, docs/**/*
Adds EUPL‑1.2 licence and comprehensive documentation for inference/state/providers.

Submodule

Layer / File(s) Summary
Submodule update
external/go
Advances submodule commit pointer.

@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🟠 Major comments (35)
docs/inference/gguf.md-1-1 (1)

1-1: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the canonical SPDX header key to avoid licence-scanner misses.

SPDX-Licence-Identifier should be SPDX-License-Identifier. Please apply this across all newly added docs headers in this PR.

Proposed fix
-<!-- SPDX-Licence-Identifier: EUPL-1.2 -->
+<!-- SPDX-License-Identifier: EUPL-1.2 -->
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/inference/gguf.md` at line 1, The SPDX header in the new docs uses the
misspelled key "SPDX-Licence-Identifier"; update all occurrences to the
canonical "SPDX-License-Identifier" (replace the string literal
"SPDX-Licence-Identifier" with "SPDX-License-Identifier") across the newly added
documentation headers in this PR so the licence-scanner recognizes them.
docs/ollama/ollama.md-1-1 (1)

1-1: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the canonical SPDX header key to keep licence scanners working.

SPDX-Licence-Identifier is not the recognised SPDX key. Please use SPDX-License-Identifier exactly.

Suggested fix
-<!-- SPDX-Licence-Identifier: EUPL-1.2 -->
+<!-- SPDX-License-Identifier: EUPL-1.2 -->
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/ollama/ollama.md` at line 1, Replace the incorrect SPDX header key
`SPDX-Licence-Identifier` with the canonical `SPDX-License-Identifier` in the
file (update the header comment string exactly), ensuring the SPDX header uses
the exact spelling so licence scanners recognize it.
docs/state/project_seed.md-1-1 (1)

1-1: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Correct SPDX header token for this document.

Please switch to SPDX-License-Identifier; the current token is non-standard and may be ignored by compliance tooling.

Proposed fix
-<!-- SPDX-Licence-Identifier: EUPL-1.2 -->
+<!-- SPDX-License-Identifier: EUPL-1.2 -->
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/state/project_seed.md` at line 1, Update the SPDX header token at the
top of docs/state/project_seed.md by replacing the non-standard
"SPDX-Licence-Identifier" token with the correct "SPDX-License-Identifier" token
so compliance tools recognize the license header; locate the existing header
comment and change the token text only, preserving the same license value and
surrounding comment markers.
docs/state/memory.md-1-1 (1)

1-1: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix SPDX tag spelling to keep licence scanning reliable.

The header must use SPDX-License-Identifier (not SPDX-Licence-Identifier) for standards-compliant tooling.

Proposed fix
-<!-- SPDX-Licence-Identifier: EUPL-1.2 -->
+<!-- SPDX-License-Identifier: EUPL-1.2 -->
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/state/memory.md` at line 1, The file contains a misspelled SPDX header
"SPDX-Licence-Identifier" which breaks license scanners; locate the header
string "SPDX-Licence-Identifier" in docs/state/memory.md and replace it with the
correct token "SPDX-License-Identifier" so tooling recognizes the license.
docs/state/identity.md-1-1 (1)

1-1: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the exact SPDX header token (SPDX-License-Identifier).

SPDX-Licence-Identifier is not a valid SPDX tag. Automated licence tooling usually requires SPDX-License-Identifier exactly, so this may be missed by scanners.

Proposed fix
-<!-- SPDX-Licence-Identifier: EUPL-1.2 -->
+<!-- SPDX-License-Identifier: EUPL-1.2 -->
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/state/identity.md` at line 1, Replace the incorrect SPDX header token
`SPDX-Licence-Identifier` with the exact required token
`SPDX-License-Identifier` in the file (update the header comment so automated
license tooling recognizes it).
docs/state/store.md-1-1 (1)

1-1: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Standardise SPDX header spelling in this file as well.

Use SPDX-License-Identifier exactly to ensure licence scanners recognise the tag.

Proposed fix
-<!-- SPDX-Licence-Identifier: EUPL-1.2 -->
+<!-- SPDX-License-Identifier: EUPL-1.2 -->
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/state/store.md` at line 1, Replace the incorrect SPDX header text "<!--
SPDX-Licence-Identifier: EUPL-1.2 -->" with the standard tag "<!--
SPDX-License-Identifier: EUPL-1.2 -->" so licence scanners recognise it; update
the header in docs/state/store.md (look for the existing comment string) to use
"License" instead of "Licence".
go/bench/bench.go-395-411 (1)

395-411: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

normalizeConfig skips CachePrompt fallback for zero config.

When configZero(cfg) is true, Line 398 returns immediately, so CachePrompt never inherits Prompt. That can drive prompt-cache benches with an empty cache prompt.

Proposed fix
 func normalizeConfig(cfg Config) Config {
 	def := DefaultConfig()
 	if configZero(cfg) {
-		return def
+		cfg = def
 	}
 	if cfg.Prompt == "" {
 		cfg.Prompt = def.Prompt
 	}
@@
 	if cfg.CachePrompt == "" {
 		cfg.CachePrompt = cfg.Prompt
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/bench/bench.go` around lines 395 - 411, normalizeConfig currently returns
early when configZero(cfg) is true, which prevents CachePrompt from inheriting
Prompt and leaves CachePrompt empty; modify normalizeConfig (function
normalizeConfig and call to configZero) so that even when configZero(cfg) is
true you still set cfg.CachePrompt = cfg.Prompt (or the default prompt) before
returning: call DefaultConfig() as def, and if configZero(cfg) is true assign
cfg.Prompt = def.Prompt and cfg.CachePrompt = cfg.Prompt (or set CachePrompt to
def.Prompt if you prefer) then return cfg; ensure other fallbacks (MaxTokens,
Runs) remain unchanged.
go/dataset.go-172-174 (1)

172-174: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Evaluator.Evaluate should return core.Result, not error.

This public failure-capable production contract currently exposes (*EvalReport, error), which breaks the package-wide core.Result error-handling model.

As per coding guidelines, "Public production functions that can fail must return core.Result; callers must branch on r.OKand user.Value only after success".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/dataset.go` around lines 172 - 174, The Evaluator interface currently
exposes Evaluate(ctx context.Context, dataset DatasetStream, cfg EvalConfig)
(*EvalReport, error) which violates the package-wide error-handling model;
change the signature of Evaluator.Evaluate to return core.Result where
Result.Value will hold the *EvalReport on success. Update the Evaluator
interface declaration (Evaluator.Evaluate), all concrete implementations, and
every call site to construct and return core.Result (set r.OK/r.Value on success
or r with error information on failure) and to branch on r.OK before using
r.Value; ensure types remain DatasetStream, EvalConfig and that the returned
Value wraps *EvalReport.
go/decode/example_test.go-9-32 (1)

9-32: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Replace placeholder examples with real API usage examples.

These examples currently print labels only; they should exercise the corresponding public decode symbols so the example file documents actual behaviour.

As per coding guidelines, "Public symbols in .gomust have triplet tests in_test.goand usage examples in_example_test.go``".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/decode/example_test.go` around lines 9 - 32, The file's example functions
currently only print labels; replace each placeholder Example*
(ExampleSpeculative, ExamplePromptLookup, ExampleTokenEqual, ExampleTokensText,
ExampleCloneTokens) with real usage that calls the corresponding public decode
API (construct necessary inputs, call the public functions/types from package
decode such as Speculative, PromptLookup, TokenEqual, TokensText, CloneTokens)
and print their real return values or observable effects; ensure each example
imports and uses the actual symbols, exercises typical inputs, and includes the
correct "// Output: ..." comment showing the expected output so `go test`
example checks pass.
go/dataset_test.go-5-8 (1)

5-8: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Use testify assertions and assert.InDelta for float checks.

This file currently relies on custom checks and direct float equality; please switch to require/assert, and use assert.InDelta for floating-point comparisons.

As per coding guidelines, "**/*_test.go: Use testify/assert for general checks and testify/require for preconditions in tests" and "Use assert.InDelta for float comparisons in tests".

Also applies to: 130-133

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/dataset_test.go` around lines 5 - 8, Replace custom checks and direct
float equality in dataset_test.go tests with testify; add imports
"github.com/stretchr/testify/assert" and "github.com/stretchr/testify/require"
to the import block, use require.* for test preconditions (e.g., setup results)
and assert.InDelta for all floating-point comparisons instead of == or manual
delta checks; update assertions inside the test functions (look for any float
comparisons and calls like t.Fatal/t.Errorf or manual if checks) to use
assert.InDelta(actual, expected, delta) and require.NoError/require.NotNil where
appropriate.
go/decode/decode.go-115-116 (1)

115-116: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Public decode APIs should follow core.Result failure semantics.

Speculative and PromptLookup are public production entry points that can fail, but currently return Go error tuples instead of core.Result.

As per coding guidelines, "Public production functions that can fail must return core.Result; callers must branch on r.OKand user.Value only after success".

Also applies to: 163-164

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/decode/decode.go` around lines 115 - 116, The public functions Speculative
and PromptLookup should return core.Result instead of an (Result, error) tuple;
change their signatures to return core.Result, remove error returns, and convert
all failure paths to return a core.Result with OK=false and a suitable error
message in Value (or other agreed failure encoding), preserving existing Result
semantics (callers must check r.OK). Update any internal helpers called (e.g.,
places returning errors inside Speculative and PromptLookup) to produce
core.Result on failure or propagate errors by wrapping them into a failure
core.Result; ensure successful paths return core.Result with OK=true and Value
populated. Finally, update all callers/tests to branch on r.OK and access
r.Value only after success.
go/decode/decode_test.go-12-49 (1)

12-49: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Avoid direct float equality in tests and migrate to testify assertions.

Line 40 compares AcceptanceRate with direct equality; use assert.InDelta, with require/assert replacing manual t.Fatalf checks in this file.

As per coding guidelines, "**/*_test.go: Use testify/assert for general checks and testify/require for preconditions in tests" and "Use assert.InDelta for float comparisons in tests".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/decode/decode_test.go` around lines 12 - 49, Update the
TestSpeculative_AcceptsAndRejectsDraftTokens_Good test to use testify's
require/assert helpers: replace the precondition checks (e.g., err != nil and
Mode checks) with require.* (require.NoError, require.Equal) and replace plain
t.Fatalf comparisons with assert.* where appropriate; specifically use
assert.InDelta to compare result.Metrics.AcceptanceRate to 2.0/3.0 instead of
direct float equality, and convert the other t.Fatalf checks to require or
assert calls (for example require.Equal for exact matches and assert.NotZero for
durations). Keep the same semantics and reference symbols result.Metrics,
AcceptanceRate, TestSpeculative_AcceptsAndRejectsDraftTokens_Good, targetCalls,
and draftCalls when making the replacements.
go/contracts_test.go-77-225 (1)

77-225: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Adopt testify/assert and testify/require in these tests.

The file currently uses custom check* helpers throughout; please switch to require for preconditions and assert for behavioural checks to match the test contract consistently.

As per coding guidelines, "**/*_test.go: Use testify/assert for general checks and testify/require for preconditions in tests".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/contracts_test.go` around lines 77 - 225, Replace the custom check*
helpers with testify's require and assert in the tests: import
"github.com/stretchr/testify/require" and "github.com/stretchr/testify/assert",
use require.* for preconditions (e.g., replace checkNoError, checkTrue before
continuing and type assertion checks in TestContracts_OptionalInterfaces,
TestContracts_CacheService, TestContracts_EmbeddingAndRerank,
TestContracts_Parsers, TestContracts_ModelPackInspector,
TestContracts_AgentMemorySession) and use assert.* for behavioural/verifying
checks (e.g., replace checkEqual, checkLen, checkNotNil where the test can
continue after a failure) and update calls that reference the helper names
(checkTrue, checkNoError, checkEqual, checkLen, checkNotNil) accordingly; ensure
each test uses require at the top for mandatory conditions (type assertions and
returned err checks) and assert for subsequent assertions.
go/eval/eval.go-161-217 (1)

161-217: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Align failing public API and error text contract with project rules.

RunDataset is public and can fail, but returns error instead of core.Result, and several messages use mlx: / mixed case (EvaluateBatch, BuildBatches, LoRA). Please adapt this API to return core.Result and normalise messages to the required inference: lowercase message without trailing period pattern.

As per coding guidelines, "Public production functions that can fail must return core.Result" and "Error strings must use the format: fmt.Errorf(\"inference: lowercase message without trailing period\")".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/eval/eval.go` around lines 161 - 217, RunDataset is a public function that
returns error and emits non-conforming messages; change its return type to
core.Result (and adjust callers) and replace all core.NewError(...) returns with
fmt.Errorf(...) using the project's error format "inference: lowercase message
without trailing period"; also normalize all literal message text (e.g.,
loader/adapter/runner checks for EvaluateBatch, BuildBatches, LoadAdapter and
dataset warnings) to lowercase phrases starting with "inference:" (no trailing
periods) and ensure the function still sets report fields (Report, ModelInfo,
Adapter) before returning appropriate core.Result success values.
go/gguf_test.go-22-49 (1)

22-49: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use testify/assert and testify/require instead of custom check* helpers in tests.

These tests currently rely on custom helpers, but project test policy requires testify/assert for assertions and testify/require for preconditions.

As per coding guidelines, "Use testify/assert for general checks and testify/require for preconditions in tests".

Also applies to: 60-87

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/gguf_test.go` around lines 22 - 49, Replace the custom test helpers in
TestGGUF_ReadGGUFInfo_Bad and TestGGUF_DiscoverModels_Ugly with testify's
require/assert: import "github.com/stretchr/testify/require" and
"github.com/stretchr/testify/assert", use require.NoError(t, err) /
require.Error(t, err) and require.Equal(t, GGUFInfo{}, info) for preconditions,
and use assert.Len(t, models, 1) and assert.Equal(t, path, models[0].Path) /
assert.Equal(t, "gemma4_text", models[0].ModelType) / assert.Equal(t, "gguf",
models[0].Format) for non-fatal checks; replace checkNoError, checkError,
checkEqual, checkLen calls accordingly while keeping existing helpers like
writeMinimalGGUFAt, DiscoverModels, and ReadGGUFInfo unchanged.
go/ollama/ollama.go-111-113 (1)

111-113: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pass Ollama's -1 and -2 semantics through to the inference layer.

GenerateOptions() silently drops non-positive NumPredict values. Since Ollama's API uses -1 for infinite generation and -2 to fill context, these should be forwarded to inference.WithMaxTokens() instead of ignored. Currently, NumPredict ≤ 0 falls through without any option, causing the inference layer to use its default 256-token limit—silently losing user intent.

The WithMaxTokens() function accepts negative values; pass all NumPredict values through.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/ollama/ollama.go` around lines 111 - 113, Update GenerateOptions so it
forwards all NumPredict values (including non-positive ones) to
inference.WithMaxTokens instead of only when NumPredict > 0: remove the
conditional that drops NumPredict ≤ 0 and always append
inference.WithMaxTokens(options.NumPredict) to opts (referencing
GenerateOptions, options.NumPredict, and inference.WithMaxTokens). This
preserves Ollama semantics for -1 and -2 by passing them through to the
inference layer.
go/openai/services.go-369-390 (1)

369-390: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Bound request-body reads in decodeServiceRequest.

This currently performs an unbounded io.ReadAll, which can be abused to consume memory. Add a max body size via http.MaxBytesReader (at handler entry) or io.LimitReader before reading.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/openai/services.go` around lines 369 - 390, The decodeServiceRequest
function uses an unbounded io.ReadAll on r.Body; change this to enforce a
maximum request size by either wrapping the request body with
http.MaxBytesReader at handler entry or by using io.LimitReader inside
decodeServiceRequest (e.g., wrap r.Body with io.LimitReader(r.Body,
MaxRequestBodySize) and define a shared MaxRequestBodySize constant). If the
read fails due to exceeding the limit, return an appropriate error (use
http.StatusRequestEntityTooLarge / 413) and preserve the existing JSON
validation/error handling (resultError, writeError) paths so oversized bodies
are rejected safely.
go/openai/openai.go-152-166 (1)

152-166: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a request-body size cap before decoding JSON.

io.ReadAll on an unbounded body allows a client to force large memory allocations. Wrap r.Body/body with http.MaxBytesReader (handler path) or io.LimitReader before reading.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/openai/openai.go` around lines 152 - 166, The DecodeRequest function
currently uses io.ReadAll(body) which can allocate unbounded memory; wrap the
incoming body with a size limiter before reading (e.g., replace io.ReadAll(body)
with io.ReadAll(io.LimitReader(body, maxBodySize))) where maxBodySize is a
reasonable constant (e.g., 1<<20 for 1 MiB) and return an explicit error if the
read hits the limit (or detect overflow by attempting to read one extra byte);
update DecodeRequest and keep calls to core.JSONUnmarshalString and resultError
unchanged so the function returns a clear "request body too large" error instead
of allowing unbounded allocation.
go/openai/responses.go-89-104 (1)

89-104: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Align ResponseGenerateOptions with the public core.Result failure contract.

This public fallible function returns error directly instead of the repository-wide core.Result pattern required by the coding guidelines for public production functions that can fail.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/openai/responses.go` around lines 89 - 104, Change ResponseGenerateOptions
to follow the repository-wide core.Result failure contract: update its return
type to ([]inference.GenerateOption, core.Result), call GenerateOptions(chatReq)
as before but capture its (opts, err) and convert any non-nil err into a
core.Result failure (e.g., core.ResultFromError or core.NewFailure) while
returning opts and core.Result for success; update all callers of
ResponseGenerateOptions to handle core.Result instead of an error.
go/openai/openai.go-151-211 (1)

151-211: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Convert public adapter functions to return core.Result.

DecodeRequest, ValidateRequest, GenerateOptions, and NormalizeStopSequences are public fallible functions that should return core.Result rather than raw error, following the pattern established in the coding guidelines. The exception for error values applies only to backend interface method implementations; these are utility functions that adapt wire formats and should follow the standard public API contract.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/openai/openai.go` around lines 151 - 211, Change the public adapter
functions DecodeRequest, ValidateRequest, GenerateOptions (and
NormalizeStopSequences) to return core.Result instead of raw error (and in
DecodeRequest the success value should be carried in the Result). For each
failure path currently returning an error, wrap the failure in core.E (or the
project's standard core error/result constructor) and return that core.Result;
for successful returns, return a core.Result representing success and containing
the value (e.g., the decoded ChatCompletionRequest or slice of
inference.GenerateOption). Update all call sites to handle core.Result (check
.OK / extract value) accordingly. Ensure function signatures and doc comments
are updated to reflect core.Result usage.
go/parser/markers.go-13-16 (1)

13-16: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Gemma channel end marker appears malformed and will miss valid closes.

On Line 13–Line 16, the end delimiter is "<channel|>", but the channel format here is "<|channel>...". This likely prevents reasoning blocks from closing for those variants.

Proposed fix
-		{start: "<|channel>thought\n", ends: []string{"<channel|>"}, kind: "thinking"},
-		{start: "<|channel>thinking\n", ends: []string{"<channel|>"}, kind: "thinking"},
-		{start: "<|channel>reasoning\n", ends: []string{"<channel|>"}, kind: "reasoning"},
-		{start: "<|channel>analysis\n", ends: []string{"<channel|>"}, kind: "analysis"},
+		{start: "<|channel>thought\n", ends: []string{"<|channel|>"}, kind: "thinking"},
+		{start: "<|channel>thinking\n", ends: []string{"<|channel|>"}, kind: "thinking"},
+		{start: "<|channel>reasoning\n", ends: []string{"<|channel|>"}, kind: "reasoning"},
+		{start: "<|channel>analysis\n", ends: []string{"<|channel|>"}, kind: "analysis"},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/parser/markers.go` around lines 13 - 16, The end delimiter for the Gemma
"channel" markers is malformed ("<channel|>") and will fail to match the
corresponding start token "<|channel>..."; update the ends value for the entries
whose start is "<|channel>thought\n", "<|channel>thinking\n",
"<|channel>reasoning\n", and "<|channel>analysis\n" to the correct closing
string "<|channel|>" so those reasoning/thinking markers properly close.
go/parser/reasoning.go-28-35 (1)

28-35: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reasoning token offsets are shifted to the marker, not the reasoning payload.

On Line 28 and Line 34, StartToken uses tokenOffset before accounting for len(marker.start), so offsets for extracted reasoning are misaligned.

Proposed fix
-		if end < 0 {
+		segmentStart := tokenOffset + len(marker.start)
+		if end < 0 {
 			reasoning := trimReasoningText(afterStart)
 			if reasoning != "" {
-				segments = append(segments, inference.ReasoningSegment{Kind: marker.kind, Text: reasoning, StartToken: tokenOffset})
+				segments = append(segments, inference.ReasoningSegment{
+					Kind: marker.kind, Text: reasoning, StartToken: segmentStart,
+				})
 			}
 			break
 		}
 		reasoning := trimReasoningText(afterStart[:end])
 		if reasoning != "" {
-			segments = append(segments, inference.ReasoningSegment{Kind: marker.kind, Text: reasoning, StartToken: tokenOffset, EndToken: tokenOffset + end})
+			segments = append(segments, inference.ReasoningSegment{
+				Kind: marker.kind, Text: reasoning, StartToken: segmentStart, EndToken: segmentStart + end,
+			})
 		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/parser/reasoning.go` around lines 28 - 35, The StartToken/EndToken for
appended inference.ReasoningSegment are currently using tokenOffset (in the
segments append lines around the uses of marker.kind and
trimReasoningText(afterStart[:end])) which ignores the marker.start length;
adjust StartToken to tokenOffset + len(marker.start) and EndToken to tokenOffset
+ len(marker.start) + end (or omit EndToken when using the first append) so the
offsets point to the reasoning payload rather than the marker; update both
places where segments = append(inference.ReasoningSegment{... StartToken: ...})
are constructed (the block using reasoning :=
trimReasoningText(afterStart[:end]) and the earlier append) to use these
corrected calculations.
go/quant/codebook/codebook.go-69-282 (1)

69-282: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Align public failing APIs with core.Result contract.

Several public production functions in this file return Go error directly (ParseProfile, ReadProfile, NewTensorDescriptor, ValidateProfile, ValidateTensorDescriptor, MatVec, ValidateTensorPayload). The repository contract requires core.Result for public failure paths outside backend-local interfaces.

As per coding guidelines, "**/*.go: Public production functions that can fail must return core.Result; callers must branch on r.OK and use r.Value only after success".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/quant/codebook/codebook.go` around lines 69 - 282, Change the public
functions ParseProfile, ReadProfile, NewTensorDescriptor, ValidateProfile,
ValidateTensorDescriptor, MatVec, and ValidateTensorPayload to return
core.Result instead of Go error/tuple results; ensure all success returns set
Result.OK = true and Result.Value to the successful value (e.g., *Profile for
ParseProfile/ReadProfile, TensorDescriptor for NewTensorDescriptor, []float32
for MatVec, or nil for pure validators) and all failure paths return Result.OK =
false with Result.Value containing the error object; update internal returns in
each function (e.g., returns that currently do "return nil, err" or "return err"
should become "return core.Result{OK:false, Value:err}") and success returns
(e.g., "return &profile, nil") should become "return core.Result{OK:true,
Value:&profile}" so callers can branch on r.OK and use r.Value only after
success.
go/probe.go-160-192 (1)

160-192: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Protect ProbeBus.sinks against concurrent Add/EmitProbe access.

ProbeBus mutates and iterates the same slice without synchronisation. If Add and EmitProbe run concurrently, this introduces a data race and unstable fan-out behaviour.

Suggested fix
 import (
+    "sync"
 )

 type ProbeBus struct {
+    mu    sync.RWMutex
     sinks []ProbeSink
 }

 func (b *ProbeBus) Add(sink ProbeSink) {
     if b == nil || sink == nil {
         return
     }
+    b.mu.Lock()
+    defer b.mu.Unlock()
     b.sinks = append(b.sinks, sink)
 }

 func (b *ProbeBus) EmitProbe(event ProbeEvent) {
     if b == nil {
         return
     }
-    for _, sink := range b.sinks {
+    b.mu.RLock()
+    sinks := append([]ProbeSink(nil), b.sinks...)
+    b.mu.RUnlock()
+    for _, sink := range sinks {
         if sink == nil {
             continue
         }
         sink.EmitProbe(event)
     }
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/probe.go` around lines 160 - 192, ProbeBus currently mutates and iterates
the sinks slice without synchronization, causing data races when Add and
EmitProbe run concurrently; add a sync.RWMutex field (e.g., mu) to ProbeBus,
take mu.Lock() in Add when appending to sinks, and in EmitProbe take mu.RLock(),
copy the b.sinks slice to a local variable, release the RLock, then iterate over
the copied slice calling sink.EmitProbe(event) so iteration happens without
holding the lock and avoids races and potential deadlocks; update references to
ProbeBus.sinks usage in NewProbeBus, Add, and EmitProbe accordingly.
go/scheduler/scheduler.go-139-152 (1)

139-152: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve caller context when delegating cancellation to the base model.

CancelRequest currently discards the incoming context and always calls the base with context.Background(). That can ignore deadlines/cancellation from upstream control paths.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/scheduler/scheduler.go` around lines 139 - 152, CancelRequest is
discarding the incoming ctx by calling the base model with context.Background();
preserve and forward the provided context when delegating to the underlying
model. In the CancelRequest method, when checking m.base for
inference.CancellableModel and calling cancellable.CancelRequest, pass the
original ctx parameter instead of context.Background() so upstream
deadlines/cancellations propagate to the base model.
go/scheduler/scheduler.go-107-119 (1)

107-119: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject duplicate active request IDs before registration.

A caller-supplied ID can overwrite an existing active[id] entry, which breaks cancellation/visibility for the earlier job. Validate uniqueness (while active) and return an error on collision before register.

Also applies to: 348-352

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/scheduler/scheduler.go` around lines 107 - 119, The code currently assigns
a caller-supplied req.ID and then immediately registers the job with m.register,
which allows a new request to overwrite an existing active entry; before calling
m.register (both at this site and the other occurrence around lines 348–352),
check whether core.Trim(req.ID) != "" and if so verify uniqueness against the
scheduler's active map (e.g. m.active or the method that looks up active jobs)
under the same mutex used by register; if a collision exists return an error to
the caller instead of proceeding, otherwise proceed to set req.ID (or
m.nextRequestID()) and then call m.register so existing active jobs cannot be
overwritten.
go/split.go-166-244 (1)

166-244: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Use core.Result for public failure paths in split planning APIs.

PlanModelSlice and ValidateSplitInferencePlan currently return error, which diverges from the repository’s public error-handling contract. Please adapt these to core.Result-based returns for consistency with caller branching expectations.

As per coding guidelines, "Public production functions that can fail must return core.Result; callers must branch on r.OK and use r.Value only after success".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/split.go` around lines 166 - 244, Both PlanModelSlice and
ValidateSplitInferencePlan must use core.Result for public failure paths instead
of returning error; change their signatures to return core.Result (e.g.,
core.Result[ModelSlicePlan] or the project’s Result pattern) and convert every
early error return to construct and return a failing core.Result, while
successful paths return a successful core.Result wrapping the value. Update all
places inside PlanModelSlice (including errors from modelSlicePresetComponents
and the custom-components check) to return a failing core.Result with the
original error, and in ValidateSplitInferencePlan convert each
core.NewError/core.Errorf return into a failing core.Result; ensure the final
success path returns an OK core.Result. Also update callers to branch on r.OK
and use r.Value only on success.
go/scheduler/scheduler.go-62-92 (1)

62-92: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Worker goroutines have no shutdown path and can leak past Close().

Workers block on m.queue forever, and Close() only delegates to m.base.Close() without stopping worker loops. Add scheduler-owned shutdown (for example: done channel + sync.WaitGroup, and close/cancel on Close) so repeated lifecycle usage does not leak goroutines.

Also applies to: 273-278, 292-296

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/scheduler/scheduler.go` around lines 62 - 92, The worker goroutines
started in New leak because they block forever on m.queue and there is no
scheduler-owned shutdown; fix by adding shutdown primitives to Model (e.g., add
fields done chan struct{} and wg sync.WaitGroup), change the worker spawn loop
to a proper for i := 0; i < maxConcurrent; i++ and for each goroutine call
m.wg.Add(1) then go m.worker(i) where worker returns and calls m.wg.Done();
modify worker to select on m.queue and case <-m.done to exit cleanly; update
Close (method Close on Model) to signal shutdown by closing or cancelling m.done
and then calling m.wg.Wait() before delegating to m.base.Close(); ensure no
double-close races (create done once in New).
go/quant/jang/jang.go-125-337 (1)

125-337: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Failing public APIs should return core.Result instead of raw error.

ReadConfig, ParseConfig, NewPackedTensorDescriptor, ValidatePackedTensor, DequantizePackedTensor, and PackQuantizedValues expose error in public production paths. This diverges from the project’s required failure contract and makes caller handling inconsistent with the rest of the stack.

As per coding guidelines, "Public production functions that can fail must return core.Result; callers must branch on r.OK and use r.Value only after success".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/quant/jang/jang.go` around lines 125 - 337, Change the listed public
functions to return core.Result instead of raw error or (*T, error): update
signatures for ReadConfig, ParseConfig, NewPackedTensorDescriptor,
ValidatePackedTensor, DequantizePackedTensor, and PackQuantizedValues to return
a core.Result whose OK flag is true on success and Value contains the success
payload (e.g. *Info, *PackedTensorDescriptor, []float32, []byte) and false on
failure with Value holding the error; replace all direct returns of nil, err or
value, nil with appropriate core.Result values and update their callers (e.g.
finalize, BuildPackedProfile, callers of
ParseConfig/ReadConfig/NewPackedTensorDescriptor/etc.) to branch on r.OK and use
r.Value after success.
go/state/memory.go-37-50 (1)

37-50: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Manifest URIs are dropped, breaking ResolveURI for preloaded refs.

NewInMemoryStoreWithManifest stores refs but does not populate uris from ref.URI, so any URI supplied via manifest cannot be resolved later.

Suggested fix
 func NewInMemoryStoreWithManifest(chunks map[int]string, refs map[int]ChunkRef) *InMemoryStore {
 	copyMap := make(map[int]string, len(chunks))
 	nextID := 1
@@
 	refMap := make(map[int]ChunkRef, len(copyMap))
+	uriMap := make(map[string]int)
 	for id := range copyMap {
 		refMap[id] = ChunkRef{
 			ChunkID:        id,
 			FrameOffset:    uint64(id),
 			HasFrameOffset: true,
 			Codec:          CodecMemory,
 		}
 	}
 	for id, ref := range refs {
 		ref.ChunkID = id
 		refMap[id] = ref
+		if ref.URI != "" {
+			uriMap[ref.URI] = id
+		}
 		if id >= nextID {
 			nextID = id + 1
 		}
 	}
 	return &InMemoryStore{
 		chunks: copyMap,
 		data:   make(map[int][]byte),
 		refs:   refMap,
-		uris:   make(map[string]int),
+		uris:   uriMap,
 		nextID: nextID,
 	}
 }

Also applies to: 118-135

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/state/memory.go` around lines 37 - 50, NewInMemoryStoreWithManifest (and
the similar constructor around lines 118-135) populates refs but never fills the
InMemoryStore.uris map from each ref's URI, so ResolveURI cannot find preloaded
manifest URIs; update both constructors to, when iterating refs (the loop that
assigns ref.ChunkID and refMap[id] = ref), also check ref.URI and if non-empty
set uris[ref.URI] = id (ensuring you use the same uris map instance returned in
the struct literal), so that InMemoryStore.ResolveURI can resolve
manifest-supplied URIs for those refs.
go/state/filestore/store.go-56-111 (1)

56-111: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Public failing API methods should return core.Result, not (..., error).

The exported production surface here still returns Go error pairs across create/open/read/write paths. That diverges from the package-level result contract and makes call-site handling inconsistent.

As per coding guidelines, **/*.go: "Public production functions that can fail must return core.Result; callers must branch on r.OK and use r.Value only after success".

Also applies to: 129-149, 151-193, 195-329

go/state/project_seed.go-105-117 (1)

105-117: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

project_id is not guaranteed when labels/metadata are initially nil.

setProjectLabel exits on nil maps, so WakeRequest and continuation SleepRequest can silently miss project_id when no labels are supplied.

Suggested fix
-func setProjectLabel(labels map[string]string, projectID string) {
-	if labels == nil || projectID == "" {
-		return
-	}
+func setProjectLabel(labels map[string]string, projectID string) map[string]string {
+	if projectID == "" {
+		return labels
+	}
+	if labels == nil {
+		labels = make(map[string]string, 1)
+	}
 	if labels["project_id"] == "" {
 		labels["project_id"] = projectID
 	}
+	return labels
 }
-	labels := mergeStringMaps(s.Labels, opts.Labels)
-	setProjectLabel(labels, s.ProjectID)
+	labels := mergeStringMaps(s.Labels, opts.Labels)
+	labels = setProjectLabel(labels, s.ProjectID)
@@
-	metadata := mergeStringMaps(s.Metadata, opts.Metadata)
-	setProjectLabel(metadata, s.ProjectID)
-	labels := mergeStringMaps(s.Labels, opts.Labels)
-	setProjectLabel(labels, s.ProjectID)
+	metadata := mergeStringMaps(s.Metadata, opts.Metadata)
+	metadata = setProjectLabel(metadata, s.ProjectID)
+	labels := mergeStringMaps(s.Labels, opts.Labels)
+	labels = setProjectLabel(labels, s.ProjectID)

Also applies to: 157-160, 291-297

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/state/project_seed.go` around lines 105 - 117, The WakeRequest builder can
call setProjectLabel on a nil map (via mergeStringMaps when s.Labels and
opts.Labels are nil), causing project_id to be omitted; fix by ensuring the
labels map is initialized before setProjectLabel is called (e.g., after labels
:= mergeStringMaps(...), if labels == nil { labels = make(map[string]string) }),
and apply the same nil-check/initialization pattern where setProjectLabel is
used elsewhere (the analogous constructors/methods referenced in the review such
as the SleepRequest builder and the other occurrences that call
setProjectLabel).
go/state/filestore/store_test.go-18-23 (1)

18-23: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Align assertions with testify/require + testify/assert.

Please migrate these table and non-table checks to require for setup/preconditions and assert for expected outcomes to keep test style consistent with repo rules.

As per coding guidelines, **/*_test.go: "Use testify/assert for general checks and testify/require for preconditions in tests".

Also applies to: 26-35, 41-46, 49-69, 82-97, 104-143, 150-189, 196-223, 228-381

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/state/filestore/store_test.go` around lines 18 - 23, Replace t.Fatalf
precondition checks with testify/require and move expected-outcome checks to
testify/assert: for the Create() error check use require.NoError(t, err)
(precondition) and for the Path() comparison use assert.Equal(t, path,
store.Path()) (expected outcome). Import "github.com/stretchr/testify/require"
and "github.com/stretchr/testify/assert" and apply the same pattern to the other
test assertions in this file (lines covering the ranges mentioned) — use
require.* for setup/preconditions (e.g., error/creation checks) and assert.* for
equality/behavior checks (e.g., Path(), Read/Write results).
go/split_test.go-14-23 (1)

14-23: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Adopt testify/assert and testify/require in these tests.

These checks currently use custom helpers; please switch preconditions to require.* and behavioural assertions to assert.* to match the repo test contract.

As per coding guidelines, **/*_test.go: "Use testify/assert for general checks and testify/require for preconditions in tests".

Also applies to: 30-37, 43-49, 58-66, 71-73, 77-89, 94-103

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/split_test.go` around lines 14 - 23, Replace the custom test helpers with
testify's require/assert: treat the error precondition checkNoError(t, err) as
require.NoError(t, err); convert strict preconditions (if any) to require.* and
all behavioral checks to assert.*, e.g. replace checkEqual(t,
ModelSlicePresetClient, plan.Preset) with assert.Equal(t,
ModelSlicePresetClient, plan.Preset), checkTrue/checkFalse with
assert.True/assert.False, and keep require.* only for necessary preconditions
(err); update imports to include "github.com/stretchr/testify/assert" and
"github.com/stretchr/testify/require". Apply the same replacements for the other
indicated blocks (lines 30-37, 43-49, 58-66, 71-73, 77-89, 94-103) referencing
the same symbols (plan.HasComponent, plan.AttentionLocal,
plan.FFNRemoteCandidate, plan.SourcePath, etc.).
go/state/store.go-102-183 (1)

102-183: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Public failing helpers should return core.Result instead of (value, error).

Resolve, ResolveBytes, ResolveRefBytes, and ResolveURI are exported production helpers that can fail, but currently return error pairs. This diverges from the package contract and pushes mixed error semantics onto callers.

As per coding guidelines, "Public production functions that can fail must return core.Result; callers must branch on r.OK and use r.Value only after success".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/state/store.go` around lines 102 - 183, The four exported helpers Resolve,
ResolveBytes, ResolveRefBytes, and ResolveURI currently return (Chunk, error)
but must return core.Result per package guidelines; change each function
signature to return core.Result, on success return a core.Result with OK:true
and Value set to the Chunk, and on failure return a core.Result with OK:false
and the error (or appropriate error information) set; update all early error
returns inside Resolve, ResolveBytes, ResolveRefBytes, and ResolveURI to
construct and return a core.Result failure, and ensure the final successful
return builds a core.Result success with the Chunk as Value so callers can
branch on r.OK and use r.Value only after success.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e65b2d7f-2704-419b-a644-5c93b5d4f422

📥 Commits

Reviewing files that changed from the base of the PR and between 613cf98 and f0af335.

⛔ Files ignored due to path filters (1)
  • go.work is excluded by !**/*.work
📒 Files selected for processing (98)
  • LICENCE
  • docs/README.md
  • docs/anthropic/anthropic.md
  • docs/inference/README.md
  • docs/inference/capability.md
  • docs/inference/contracts.md
  • docs/inference/dataset.md
  • docs/inference/discover.md
  • docs/inference/gguf.md
  • docs/inference/identity.md
  • docs/inference/inference.md
  • docs/inference/local_tuning.md
  • docs/inference/options.md
  • docs/inference/probe.md
  • docs/inference/service.md
  • docs/inference/training.md
  • docs/ollama/ollama.md
  • docs/openai/README.md
  • docs/openai/openai.md
  • docs/openai/responses.md
  • docs/openai/services.md
  • docs/state/README.md
  • docs/state/agent_memory.md
  • docs/state/filestore.md
  • docs/state/identity.md
  • docs/state/memory.md
  • docs/state/project_seed.md
  • docs/state/store.md
  • external/go
  • go/anthropic/anthropic.go
  • go/anthropic/anthropic_test.go
  • go/bench/bench.go
  • go/bench/bench_test.go
  • go/capability.go
  • go/capability_example_test.go
  • go/capability_test.go
  • go/contracts.go
  • go/contracts_example_test.go
  • go/contracts_test.go
  • go/dataset.go
  • go/dataset_example_test.go
  • go/dataset_test.go
  • go/decode/decode.go
  • go/decode/decode_test.go
  • go/decode/example_test.go
  • go/discover.go
  • go/eval/eval.go
  • go/gguf.go
  • go/gguf_test.go
  • go/identity.go
  • go/identity_example_test.go
  • go/identity_test.go
  • go/ollama/ollama.go
  • go/ollama/ollama_test.go
  • go/openai/openai.go
  • go/openai/openai_test.go
  • go/openai/responses.go
  • go/openai/responses_test.go
  • go/openai/services.go
  • go/openai/services_test.go
  • go/parser/builtin.go
  • go/parser/markers.go
  • go/parser/reasoning.go
  • go/parser/reasoning_test.go
  • go/parser/registry.go
  • go/parser/registry_test.go
  • go/parser/selector.go
  • go/parser/thinking.go
  • go/parser/thinking_test.go
  • go/parser/tools.go
  • go/parser/tools_test.go
  • go/parser/types.go
  • go/probe.go
  • go/probe_example_test.go
  • go/probe_test.go
  • go/quant/codebook/codebook.go
  • go/quant/codebook/codebook_test.go
  • go/quant/jang/jang.go
  • go/quant/jang/jang_test.go
  • go/scheduler/example_test.go
  • go/scheduler/scheduler.go
  • go/scheduler/scheduler_test.go
  • go/service.go
  • go/service_test.go
  • go/split.go
  • go/split_example_test.go
  • go/split_test.go
  • go/state/agent_memory.go
  • go/state/filestore/store.go
  • go/state/filestore/store_test.go
  • go/state/identity.go
  • go/state/memory.go
  • go/state/project_seed.go
  • go/state/project_seed_test.go
  • go/state/state_test.go
  • go/state/store.go
  • go/tuning.go
  • go/tuning_test.go

Comment thread go/gguf.go
Comment on lines +194 to +199
var metadataCount uint64
if err := binary.Read(file, binary.LittleEndian, &metadataCount); err != nil {
return nil, 0, core.Errorf("inference: read gguf metadata count: %w", err)
}
metadata := make(map[string]any, metadataCount)
for range metadataCount {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the declaration and use are mismatched by type (uint64 -> make hint).
rg -n -C2 'var metadataCount uint64|make\(map\[string\]any, metadataCount\)' go/gguf.go

Repository: dAppCore/go-inference

Length of output: 476


Fix compile-time type mismatch in map allocation hint.

metadataCount is uint64, but make(map[string]any, ...) expects an int capacity hint. This will not compile.

Suggested patch
-	metadata := make(map[string]any, metadataCount)
+	metadata := make(map[string]any, int(metadataCount))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
var metadataCount uint64
if err := binary.Read(file, binary.LittleEndian, &metadataCount); err != nil {
return nil, 0, core.Errorf("inference: read gguf metadata count: %w", err)
}
metadata := make(map[string]any, metadataCount)
for range metadataCount {
var metadataCount uint64
if err := binary.Read(file, binary.LittleEndian, &metadataCount); err != nil {
return nil, 0, core.Errorf("inference: read gguf metadata count: %w", err)
}
metadata := make(map[string]any, int(metadataCount))
for range metadataCount {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/gguf.go` around lines 194 - 199, The allocation and loop use of
metadataCount (a uint64) is type-mismatched: change the map allocation to use
int(metadataCount) as the capacity hint in make(map[string]any,
int(metadataCount)) and replace the invalid "for range metadataCount" with an
explicit loop such as "for i := uint64(0); i < metadataCount; i++ { ... }" (or
cast to int for the loop variable) so all operations on metadataCount compile;
update the code around the metadataCount, metadata map, and the loop that
populates metadata accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant