Feature wave: Android AAR + Kotlin facade, server attach/router modes, LangChain4j streaming, GGUF tooling (llama.cpp b9878) by bernardladenthin · Pull Request #298 · bernardladenthin/java-llama.cpp

bernardladenthin · 2026-07-05T12:09:45Z

Summary

Android distribution: net.ladenthin:llama-android / llama-android-opencl AARs (standalone plain-Gradle build, no AGP/SDK needed; classes byte-identical to the Maven core jar) + the net.ladenthin:llama-kotlin coroutines facade (Flow streaming, suspend wrappers, cancellation wired to CancellationToken). The CPU AAR is multi-ABI (arm64-v8a + x86_64); a new dockcross x86_64 job also feeds the default JAR. Includes the Android dlopen fix: GGML_OPENMP OFF + -static-libstdc++ remove the libomp.so/libc++_shared.so DT_NEEDED entries that made System.loadLibrary fail on every device (latent in the released 5.0.5 arm64 lib too); CI now enforces a bionic-only DT_NEEDED whitelist and 16 KB LOAD alignment per shipped .so.
Server modes: NativeServer attach mode (NativeServer(LlamaModel, String...), patch 0007) serves an already-loaded model over the full upstream HTTP frontend (one copy of the weights); in-JVM router mode (patch 0008 + NativeServer.setWorkerCommand) with the typed RouterClient/RouterModel API (list/load/unload/await-loaded with fail-fast on failed workers).
API additions: pure-Java GgufInspector (GGUF v2/v3 header/metadata reader, LE+BE, no model load), LlamaQuantizer (in-JVM llama_model_quantize), Session.checkpoint/rewind/fork (slot-checkpoint based conversation branching), runtime LoRA adapter control, typed batch embeddings (embed(Collection)), UTF-8-safe JNI string path (utf8_to_jstring_impl — fixes supplementary-plane/emoji handling, Android CheckJNI-safe).
LangChain4j: blocking tool calling (own JsonSchemaElementSerializer), JSON mode / response_format, multimodal user input, and full streaming — streamed tool calls (onPartialToolCall/onCompleteToolCall), per-token thinking events, real finish reason + token usage (StreamingChunkAssembler).
llama.cpp pin: b9870 → b9878 (three small internal-only ranges; all 8 carried patches re-verified at each step; history rows appended).
CI: test-java-llama-kotlin, package-android-aar (structure, 16 KB alignment, DT_NEEDED whitelist, AGP/R8 consumer smoke), crosscompile-android-x86_64, and test-android-emulator — on-device System.loadLibrary + GgufInspector + real inference on a KVM x86_64 emulator, promoted to a release gate after running flake-free. Committed audio fixture (audios/sample.wav, REUSE-annotated) is now the AudioInputIntegrationTest default prompt. TODO.md cleaned to open-items-only.

Test plan

Affected unit / integration tests pass locally
CI is green on this branch (full 45-job matrix green on the b9873 head 99359a0, incl. emulator on-device inference, model-backed integration suites on Linux/macOS x3/Windows x3, PIT 295/295, C++ suites incl. s390x/qemu; final head re-run pending after the b9876/b9878 bumps — both ranges are internal-only upstream fixes with all patches re-verified)
Docs / CHANGELOG updated where applicable (README classifier table + Importing in Android + Similar Projects, CLAUDE.md sections for AAR/Kotlin/emulator gate/dlopen invariant, docs/history/llama-cpp-breaking-changes.md rows, TODO.md cleanup)

Related issues / PRs

Closes the "NativeServer — reuse an already-loaded LlamaModel" TODO via attach mode (patch 0007).
Fixes the Android UnsatisfiedLinkError class of failures (libomp/libc++_shared DT_NEEDED) — also latent in the 5.0.4/5.0.5 Android arm64 artifacts.
Upstream-PR carries: patches 0003 (server : add slot_prompt_similarity getter/setter ggml-org/llama.cpp#22393) and 0004 (server: honour per-request reasoning_budget_tokens in chat completions ggml-org/llama.cpp#23116); patches 0001/0002/0005–0008 are upstream-submittable (tracked in TODO.md).

Checklist

I have read CONTRIBUTING.md and CODE_OF_CONDUCT.md
My commits follow Conventional Commits
No security-sensitive changes (if there are, I have notified the maintainer privately per SECURITY.md)

🤖 Generated with Claude Code

https://claude.ai/code/session_01XVMuGj2shABrHWJ9sNqLqX

Three features from the similar-projects investigation (native-server-first scope — no new Java-server routes): Runtime LoRA adapter control (upstream GET/POST /lora-adapters parity): - new JNI methods getLoraAdaptersJson/setLoraAdaptersJson posting SERVER_TASK_TYPE_GET_LORA / SET_LORA (parse_lora_request wire format) - typed LlamaModel.getLoraAdapters() / setLoraAdapters(Map) / setLoraAdapter(int, float); new value.LoraAdapter + json.LoraAdapterResponseParser (finite-scale validation) - closes the setLoraInitWithoutApply() gap (its Javadoc pointed at an endpoint the bindings could not reach) Typed batch embeddings (requested by upstream kherud users): - LlamaModel.embed(Collection<String>) -> List<float[]> over the OAI array-input path of handleEmbeddings; json.EmbeddingResponseParser restores request order via the response index field UTF-8-safe JNI string path: - json_to_jstring_impl now serialises via upstream safe_json_to_str (U+FFFD replacement instead of json::type_error 316 when non-stream content ends mid-codepoint at the token limit) and builds the Java String through the cached String(byte[], "UTF-8") constructor (utf8_to_jstring_impl) instead of NewStringUTF, which expects Modified UTF-8 and is spec-invalid for supplementary-plane characters (4-byte emoji; Android CheckJNI aborts) - applyTemplate return and the log-callback message take the same path - streamed chunks were already boundary-safe (upstream process_token holds back incomplete UTF-8); pinned end to end by the new tests Tests: +17 C++ unit tests (utf8/json_to_jstring byte-capture mocks, parse_lora_request, server_task_result_get_lora::to_json; total 479), +28 model-free Java unit tests (parsers + PIT-complete LoraAdapter), +3 model-backed integration classes/methods (RuntimeLoraIntegrationTest, Utf8RoundTripIntegrationTest, LlamaEmbeddingsTest batch cases). PIT 255/255 mutants killed; javadoc:jar clean; ArchUnit green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVMuGj2shABrHWJ9sNqLqX

Three native-server-focused features: - NativeServer attach mode (closes the "reuse an already-loaded LlamaModel" TODO): patches/0007 extracts the upstream route table into a shared llama_server_register_common_routes(...) and adds llama_server_attach(), which serves an already-loaded LlamaModel's server_context over the full upstream HTTP frontend (WebUI, resumable streaming) - no second model load, no second start_loop; the model's worker keeps driving the queue. Java: NativeServer(LlamaModel, String...) over startAttachedNativeServer JNI. Validated by NativeServerAttachIntegrationTest (HTTP health/props/ completion/chat + concurrent direct JNI calls on the same model). - In-JVM router mode (multi-model management): the upstream router spawns workers by re-executing its own binary, which inside a JVM is java, so embedded router workers could never start. patches/0008 adds the LLAMA_SERVER_WORKER_CMD override (whitespace-split, replaces only the worker-binary token), exposed as NativeServer.setWorkerCommand(String...); workers relaunch as fresh JVMs running the classic single-model NativeServer. Validated by RouterModeIntegrationTest (Linux CI: --models-dir listing -> POST /models/load -> worker-JVM spawn -> proxied chat completion) plus model-free setWorkerCommand validation tests. - In-JVM GGUF quantization: LlamaQuantizer.quantize(in, out, QuantizationType[, threads, allowRequantize]) over llama_model_quantize (LLamaSharp/llama-cpp-python precedent). args.QuantizationType pins the llama_ftype b9870 mapping (PIT-complete, 256/256 mutants killed). QuantizerIntegrationTest re-quantizes the 135M draft model and loads the result; refusal-without-opt-in and missing-input error paths covered. Local verification: full native rebuild with patches 0007/0008 applied cleanly, 479/479 C++ tests pass, NativeLibraryLoadSmokeTest green with the rebuilt lib, javadoc clean, spotless + pinned clang-format applied. The model-backed integration tests run in CI. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVMuGj2shABrHWJ9sNqLqX

Bridge the remaining langchain4j v1 gaps in llama-langchain4j (blocking path): - Tool calling: ChatRequest.toolSpecifications()/toolChoice() map to the jllama typed tools path (ToolDefinition + tool_choice); assistant tool-call turns and ToolExecutionResultMessages round-trip through the history, and a native tool_calls response comes back as AiMessage.toolExecutionRequests() with finish reason TOOL_EXECUTION. - JsonSchemaElementSerializer: recursive public-API-only serializer for the langchain4j JsonSchemaElement tree (object/string/integer/number/ boolean/enum/array/reference/anyOf/null/raw), emitting langchain4j's $defs / #/$defs/... conventions (their serializer is internal-only). - response_format: ResponseFormat.JSON maps to json_object mode; a JsonSchema-bearing format maps to the native json_schema grammar constraint (structured output). Applies to both adapters. - Multimodal user input: ImageContent (base64 or URL) and AudioContent (inline wav/mp3) map to ContentPart array-form content for the mtmd pipeline; unsupported media fails loud instead of silently dropping. - JllamaStreamingChatModel: fails fast with UnsupportedFeatureException when tools are requested (streaming tool-call reconstruction is the documented follow-up). Tests: 12 new model-free mapping/serializer tests (31 total in module), plus JllamaToolCallingIntegrationTest (gated on the new net.ladenthin.llama.langchain4j.tool.model property; CI passes the cached Qwen2.5-Instruct tool model to the langchain4j integration job). Also bundles three SpotBugs verify fixes from the previous batch: LlamaModel static-field ordering (IMC_IMMATURE_CLASS_WRONG_FIELD_ORDER), EmbeddingResponseParser IndexedVector rewrite (CLI_CONSTANT_LIST_INDEX), and a scoped EI_EXPOSE_REP2 exclusion for NativeServer's borrowed-model attach constructor. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVMuGj2shABrHWJ9sNqLqX

Replace the submodule/NDK source-integration flow as the recommended Android path with first-class Maven artifacts, so an Android Studio app needs exactly one dependency line: implementation("net.ladenthin:llama-android:<version>") // CPU implementation("net.ladenthin:llama-android-opencl:<version>") // Adreno implementation("net.ladenthin:llama-kotlin:<version>") // optional llama-android/ (standalone plain-Gradle build, NOT a reactor module — Maven cannot deploy <packaging>aar</packaging>; no AGP and no Android SDK needed to build, version + mirrored dependency versions are parsed from the Maven poms so `mvn versions:set` stays the single bump point): - AAR = manifest (minSdkVersion 28, enforced on consumers by AGP) + classes.jar (byte-identical Maven-built core classes minus desktop native resources and module-info.class) + jni/arm64-v8a/libjllama.so + consumer R8/ProGuard rules (proguard.txt, applied automatically) + R.txt. - POM mirrors the core's compile deps (jackson/slf4j-api/jspecify/ checker-qual); logback deliberately excluded (JVM-only binding). - LlamaLoader already tries System.loadLibrary("jllama") first on Android, so the AAR-installed .so resolves with zero core changes. llama-kotlin/ (new Maven reactor module, pure Kotlin 2.2 / jvmTarget 1.8): - generateFlow/generateChatFlow: cold Flow token streaming, source closed on completion, error, AND cancellation (no leaked native task slots). - completeSuspend/chatSuspend/chatCompleteTextSuspend/embedSuspend; completeSuspend wires coroutine cancellation into the cooperative CancellationToken so a cancelled coroutine stops the native loop at the next token boundary. - Core dep is provided-scope so Android consumers pair the facade with the AAR instead of transitively pulling the fat desktop JAR. - 6 model-free unit tests over the internal seams. 16 KB page-size (Google Play, Android 15+ targets): CMakeLists.txt now pins -Wl,-z,max-page-size=16384 for Android builds and CI asserts every LOAD segment of the shipped .so is 16384-aligned (currently satisfied by toolchain default; the pin + assert prevent silent regression). CI (publish.yml): - test-java-llama-kotlin: model-free unit tests. - package-android-aar: assembles both AARs from the fresh native artifacts, validates structure (entries, minSdk, classes.jar content, 16 KB alignment) and runs an AGP consumer smoke test — the minimal app fixture in .github/android-consumer-test/ resolves the AAR from mavenLocal and runs a full R8 assembleRelease on the runner's Android SDK, then asserts the APK carries libjllama.so and the un-stripped binding (proving Android Studio consumption without an emulator). - publish-snapshot/publish-release: gated on the new jobs; AAR snapshots publish to the Central snapshots repo via Gradle, releases upload a signed Central Portal bundle via the Publisher API. llama-kotlin rides the normal reactor deploy. Docs: README "Importing in Android" rewritten around the AAR (source integration kept as advanced option), module READMEs, CLAUDE.md (reactor layout, version bump, new "Android AAR + Kotlin facade" section), RELEASE.md, TODO.md (Android section marked done; sample app and multi-ABI/emulator-CI stay as follow-ups). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVMuGj2shABrHWJ9sNqLqX

…inks Add the projects surveyed during the feature-gap research and Android investigation that were not yet linked: the kherud/java-llama.cpp fork parent (previously only in the header note), the sibling llama.cpp bindings in other languages (llama-cpp-python, LLamaSharp, node-llama-cpp), and a new "Other local inference stacks" group for Ollama (whose native API this project's server implements) and ExecuTorch (the engine behind llama-stack-client-kotlin's local mode). The llama-stack-client-kotlin entry now points at the new llama-android AAR + llama-kotlin facade as the native on-device equivalent. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVMuGj2shABrHWJ9sNqLqX

Replace the raw HTTP+JSON boilerplate router-mode callers had to write themselves with a typed client for the upstream model-management endpoints: - value.RouterModel (+ nested Status enum): one GET /models entry — identifier, lifecycle status (exact-match mapping of upstream server_model_status_to_string strings: downloading/downloaded/ unloaded/loading/loaded/sleeping, UNKNOWN otherwise), the raw status string, and the router's failed-worker marker (status.failed + exit_code). - json.RouterModelsResponseParser: pure transform of the router GET /models wire format (data/models array fallback, id/name fallback), unit-testable with JSON literals. - server.RouterClient: listModels/findModel/loadModel/unloadModel plus awaitModelLoaded(id, timeout) — polls until LOADED and fails fast with the worker's exit code when the router marks the model failed, or immediately for an unknown id, instead of running out the timeout. Non-2xx responses surface the router's error body. Works against the in-JVM NativeServer router or any external llama-server router (plain HTTP, no JNI). Tests: 25 new model-free tests — RouterModelTest (getters, status mapping, equals/hashCode, toString shapes), RouterModelsResponseParserTest (upstream shape, failed marker, fallbacks, tolerance), RouterClientTest (stub HTTP server: parsing, request bodies, error surfacing, the awaitModelLoaded state machine incl. poll-sequence, fail-fast, and timeout paths). RouterModeIntegrationTest now drives model discovery, load, and readiness through RouterClient against a real router, replacing its hand-rolled JSON polling. Gates: layeredArchitecture updated (Server may access Json — the rule is the documented intent registry for new inter-package edges); awaitModelLoaded uses a never-counted-down CountDownLatch instead of the banned Thread.sleep; SpotBugs clean (toString/equals/hashCode added, exact status matching avoids IMPROPER_UNICODE, scoped URLCONNECTION_SSRF_FD exclusion with developer-supplied-host rationale); PIT 274/274 (RouterModel inside the value.* 100% gate); javadoc builds clean. README router-mode section, CLAUDE.md, and TODO.md updated. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVMuGj2shABrHWJ9sNqLqX

@EqualsAndHashCode

Replace the handwritten equals/hashCode with @EqualsAndHashCode over the host/port fields, matching the established pattern (value.* and the other server.* classes). toString stays intentionally handwritten so the client renders as its target URL in log traces — the same documented handwritten-toString convention ChatMessage/ToolCall/RouterModel use. SpotBugs (IMC_IMMATURE_CLASS_*) stays satisfied by the generated methods. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVMuGj2shABrHWJ9sNqLqX

Three backlog features, each with model-free tests plus gated integration coverage: GGUF metadata inspector (no model load): - GgufInspector: pure-Java GGUF v2/v3 header + key/value reader — no native library, no tensor data, cost independent of file size. Little- and big-endian containers auto-detected via the version field; fail-loud on v1, unknown versions/type ids, truncation, and implausible lengths (sanity caps). All value types decoded (integers→Long, floats→Double, bool, string, arrays). - value.GgufMetadata: full entry table + typed accessors (architecture, name, parameter count, <arch>.context_length, general.file_type, chat template). Complements the loaded-model getModelMeta(). - 21 tests against in-memory generated fixtures (no committed binaries) + a gated real-model read. LangChain4j streaming tool calls + thinking events: - JllamaStreamingChatModel now streams over the native OAI chat.completion.chunk path via the new StreamingChunkAssembler: delta.content → onPartialResponse, delta.reasoning_content → onPartialThinking (+ AiMessage.thinking()), delta.tool_calls fragments accumulated per index → onPartialToolCall / onCompleteToolCall and AiMessage.toolExecutionRequests() with finish reason TOOL_EXECUTION; real finish reason + token usage on the final response. The UnsupportedFeatureException fail-fast is gone; toStreamingParameters now carries tools/tool_choice like the blocking path. - 6 assembler tests (canned chunks: text, split/parallel tool calls, thinking, usage, fail-loud) + a gated streamed-tool-call integration test. Session fork/rewind (conversation checkpoints): - Session.checkpoint(filepath) → value.SessionCheckpoint pairing the native slot KV-save file with the transcript-turn snapshot; Session.rewind(checkpoint) restores both atomically under the session lock (native state and transcript cannot drift); Session.fork(newSlotId, filepath) branches into an independent session on another slot (same system message + params customizer; requires setParallel >= 2). All rejected while a stream is in progress, same guard as save/restore. - Plumbing: ChatTranscript.turnsSnapshot()/resetTurns(), SessionState.turnsSnapshot()/restoreTurns()/getSystemMessage(). - Model-free bookkeeping/guard tests + SessionForkRewindIntegrationTest (rewind-continue, independent fork, own-slot fail-fast). Gates: PIT 295/295 (GgufMetadata, SessionCheckpoint, ChatTranscript additions inside the value.* 100% gate); SpotBugs clean (dynamic exception messages in GgufInspector; scoped exclusions with rationale for the stateful-reader PRMC false positive, the tagged-decoder URV, and SessionCheckpoint's order-significant List parameter — ChatMessage precedent); javadoc clean; langchain4j module verify green (38 tests). README sections for checkpoints and GGUF inspection; TODO.md updated. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVMuGj2shABrHWJ9sNqLqX

Extends Android from build-verified to runtime-verified in CI, and makes the binding usable on x86_64 Android environments (Android Studio emulator, Chromebooks, x86-64 Android hardware) for all consumers. Phase 1 — new native build: - .github/dockcross/dockcross-android-x86_64 wrapper (same pinned image tag as the arm64 one; wrappers are image-generic launchers — verified byte-identical modulo image name; update.sh already listed the generation command). - crosscompile-android-x86_64 job (dockcross + the same sccache steady-state env), artifact Linux-Android-x86_64-libraries — fail-loud and in the package/publish needs graphs. The artifact ALSO merges into the default JAR's Linux-Android/x86_64 tree automatically via the *-libraries glob (OSInfo already maps x86_64 Android there), so plain JAR consumers get the ABI too. The CMake Android guard (weak symbols + 16 KB max-page-size) keys on OS_NAME and applies unchanged. Phase 2 — multi-ABI AAR: - llama-android CPU AAR now ships jni/arm64-v8a + jni/x86_64 (per-ABI fail-loud staging checks; app bundles split per ABI so phones download only arm64). OpenCL flavor stays arm64-only (Adreno = Qualcomm ARM). - Structural validation covers both ABIs incl. the 16 KB LOAD-alignment readelf check per .so; the R8 consumer smoke asserts both libs in the APK; publish jobs stage both ABIs. Phase 3 — on-emulator instrumentation: - test-android-emulator job: KVM-accelerated x86_64 emulator (API 30, reactivecircus/android-emulator-runner), publishes the CPU AAR to mavenLocal (per-publication task), adb-pushes the already-cached draft model (AMD-Llama-135m, no new download) and runs the consumer fixture's connectedDebugAndroidTest. - OnDeviceInferenceTest (androidx.test): System.loadLibrary("jllama") from the APK's native-lib dir + JNI_OnLoad FindClass against D8-dexed classes, pure-Java GgufInspector on-device, and real native inference (non-empty generation). Self-skips without the pushed model so a bare local emulator run stays green. - VALIDATION-ONLY for now (not in the publish needs graphs): emulator boot is the flakiest CI machinery; promote to a release gate after a stable streak (same staged policy as the sccache rollout). Not covered by the emulator: arm64 kernels and the Adreno flavor — the planned example app covers those on real hardware. Docs: README (default-JAR platforms, 64-bit-only note, AAR section), llama-android/README (multi-ABI), CLAUDE.md, TODO.md, fixture README. Locally verified: multi-ABI AAR assembles with both ABIs, per-ABI fail-loud check fires on a missing .so, and the per-publication mavenLocal task publishes the CPU AAR. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVMuGj2shABrHWJ9sNqLqX

@afterall

Two first-run failures on PR #298: - REUSE compliance (test job): the four files added by the Android/Kotlin work lacked SPDX info — llama-android/README.md, llama-kotlin/README.md, and the javadoc-placeholder README.txt get SPDX headers; the generated dockcross-android-x86_64 wrapper joins the existing dockcross wrapper annotation in REUSE.toml. `reuse lint` is compliant again (365/365). - SonarCloud "Build and analyze": RouterModeIntegrationTest.tearDown called NativeServer.setWorkerCommand() unconditionally; when the class self-skips via a @BeforeAll assumption (no model on the lib-less analysis runner) @afterall still runs, and setWorkerCommand loads the native library -> UnsatisfiedLinkError. The teardown now clears the worker-command override only when setup actually installed it (workerCommandSet flag), so a skipped class tears down as a no-op. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVMuGj2shABrHWJ9sNqLqX

…RL() CodeQL flagged the URL(String) constructor (deprecated since JDK 20, no validation/encoding). URI.create(...).toURL() is the non-deprecated equivalent and is available on Java 8, so the bytecode floor is unaffected. Behavior identical for the fixed localhost/router URLs; 9/9 RouterClientTest green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVMuGj2shABrHWJ9sNqLqX

Two more first-run failures on PR #298 (head 725f570): - Package + Validate Android AARs: the R8 release pass in the consumer smoke failed with "Missing classes" — the AAR's consumer keep rule retains the whole binding, so R8 verifies every referenced type, including compile-time-only ones absent on Android: com.sun.net.httpserver.* (JVM-only OpenAiCompatServer transport), lombok.Generated and animal-sniffer's IgnoreJRERequirement (CLASS-retention build annotations). consumer-proguard.txt now ships the matching -dontwarn rules, so every consumer app's R8 pass gets them automatically — the standard treatment for compileOnly references in published Android libraries. - Android emulator on-device test: sh exit code 2 with no gradle output — reactivecircus/android-emulator-runner executes the script input LINE BY LINE via sh, so the multi-line if-block was fed as a lone "if ...; then" (syntax error). The logic moves into the committed .github/run-android-emulator-test.sh (bash -n verified) and the job's script: is a single line invoking it. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVMuGj2shABrHWJ9sNqLqX

The on-device emulator test failed with UnsatisfiedLinkError ("No native library found ... Directly from .apk/lib") even though the x86_64 libjllama.so was verifiably inside the APK. Root cause (confirmed via readelf -d on the shipped 5.0.5 arm64 Android lib, which carries the same latent defect): the dockcross cross-clang links two DT_NEEDED entries that exist on no Android device, so bionic's dlopen rejects the library: - libomp.so (LLVM OpenMP runtime, pulled in by ggml's OpenMP path) - libc++_shared.so (NDK shared C++ runtime, only present when an app packages it itself) Three-part fix: 1. llama/CMakeLists.txt (Android guard): set GGML_OPENMP OFF (ggml falls back to its own std::thread pool — the same trade the Windows-arm64 clang-cl job makes) and link -static-libstdc++ so libc++ is embedded. Only bionic system libraries remain as dependencies. 2. publish.yml (package-android-aar validation): per-.so DT_NEEDED whitelist via readelf -dW (libc/libm/libdl/liblog/libandroid, plus libOpenCL.so for the OpenCL flavor) — a future toolchain bump cannot silently reintroduce a non-bionic dependency; the job fails naming the offending library. 3. LlamaLoader: the Android System.loadLibrary catch block now includes the UnsatisfiedLinkError message in the "Directly from .apk/lib (...)" tried-path entry — the actual dlopen reason was previously swallowed, which made this failure look like a missing library. Also documents the new dlopen-ability invariant in CLAUDE.md next to the 16 KB page-size invariant. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVMuGj2shABrHWJ9sNqLqX

Small upstream range (5 files, ~9.5 KiB): a quantized-tensor fix for the CPU concat op and a null-buffer guard for the K/V rotation graph inputs (upstream #25215), plus WebUI settings changes (auto-followed by the build-webui job) and a test-backend-ops addition (not built here). All eight local patches (0001-0008) re-verified: applied cleanly in order onto a b9873 checkout; the range touches no patch-target file and no OuteTTS generator anchor. History row appended to docs/history/llama-cpp-breaking-changes.md. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVMuGj2shABrHWJ9sNqLqX

ggml-only range (3 files, ~9.6 KiB): the CUDA concat op gains the same quantized-tensor block-size handling b9873 added to the CPU op, plus a tensor-parallel + -ncmoe crash fix on MoE models (upstream #25028). No API surface, no project source changes. All eight local patches (0001-0008) re-verified: applied cleanly in order onto a b9876 checkout; the range touches no patch-target file and no OuteTTS generator anchor. History rows appended to docs/history/llama-cpp-breaking-changes.md. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVMuGj2shABrHWJ9sNqLqX

The TODO entry "PIT gate not hermetic — value.ContentPart.audioFile(Path)" was stale: ContentPartTest already carries the hermetic @tempdir tests the entry proposed (wav dispatch incl. case-insensitive .WAV, mp3 dispatch, unknown-extension rejection). Verified in a fixture-less, network-restricted sandbox: mvn -f llama/pom.xml test-compile pitest:mutationCoverage reports 295/295 mutations killed (100%), 0 NO_COVERAGE. No committed audio fixture is needed for the PIT gate; the model-backed AudioInputIntegrationTest remains separately (and intentionally) gated on a real speech clip. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVMuGj2shABrHWJ9sNqLqX

…e gate The committed sample.wav (260ddb0) redded both REUSE lint jobs — a new binary with no license info. This covers and wires it: - llama/src/test/resources/audios/README.md: provenance/license/override notes mirroring the images/ README (recorded by the project author, MIT-granted for this project). - REUSE.toml: the audios README joins the MIT markdown list and sample.wav gets its own MIT annotation (WAV has no in-file header channel, same as test-image.jpg). reuse lint: 368/368 compliant. - AudioInputIntegrationTest now defaults the audio prompt to the committed clip (TestConstants.DEFAULT_AUDIO_INPUT_PATH), mirroring the vision.image default — only the audio model + mmproj still need staging. README/CLAUDE.md property tables updated. Also promotes test-android-emulator to a RELEASE GATE (both publish needs: graphs) per owner decision: the job ran flake-free through PR #298's validation cycle (boot ~30 s, on-device inference green), so a broken on-device runtime now blocks publishing — same fail-loud policy as every native artifact job. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVMuGj2shABrHWJ9sNqLqX

Every DONE/RESOLVED entry moves out of "Open"; a concise 2026-07-05 record (one-liners with pointers to PR #298 / CLAUDE.md / git history) is kept in the Done section. Trims: - Dropped fully-done sections: NativeServer attach mode, typed router API, GGUF inspector, session fork/rewind, PIT hermeticity, Windows native classifiers, b9739 arg-parse regression, code audit (its one optional follow-up becomes its own small open section), branch protection rename (closed as a no-op per owner). - OpenAI-compat endpoint section reduced to its open follow-ups, marked deprioritized per the native-server-first owner decision. - Similar-projects backlog reduced to the jbang example remainder. - Android section reduced to the example-app follow-up. - Upstream-PR section generalized from patch 0001 to all six upstream-submittable patches (0001/0002/0005-0008). - License Compliance entry notes the same 17-issues status now blocks PR #298's merge state. File shrinks 654 -> 315 lines; only genuinely open work remains under "Open". Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVMuGj2shABrHWJ9sNqLqX

Smallest range yet (2 files, ~1.8 KiB), internal-only: a fail-loud GGML_ABORT guard in the ggml meta backend for unsupported multi-buffers (upstream #22197), and llama_model now copies the borrowed tensor_split array into an owned vector so tensor-parallel KV-cache split metadata cannot read a dangling caller pointer. No API surface, no project source changes. All eight local patches (0001-0008) re-verified: applied cleanly in order onto a b9878 checkout; the range touches no patch-target file and no OuteTTS generator anchor. History rows appended to docs/history/llama-cpp-breaking-changes.md. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVMuGj2shABrHWJ9sNqLqX

sonarqubecloud · 2026-07-05T20:13:11Z

Quality Gate passed

Issues
159 New issues
0 Accepted issues

Measures
0 Security Hotspots
82.1% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

claude added 9 commits July 5, 2026 08:24

bernardladenthin temporarily deployed to startgate July 5, 2026 12:09 — with GitHub Actions Inactive

bernardladenthin had a problem deploying to startgate July 5, 2026 12:14 — with GitHub Actions Error

github-advanced-security AI found potential problems Jul 5, 2026

View reviewed changes

Comment thread llama/src/main/java/net/ladenthin/llama/server/RouterClient.java Fixed

bernardladenthin had a problem deploying to startgate July 5, 2026 12:21 — with GitHub Actions Error

bernardladenthin temporarily deployed to startgate July 5, 2026 12:23 — with GitHub Actions Inactive

bernardladenthin temporarily deployed to startgate July 5, 2026 12:47 — with GitHub Actions Inactive

bernardladenthin temporarily deployed to startgate July 5, 2026 13:06 — with GitHub Actions Inactive

bernardladenthin had a problem deploying to startgate July 5, 2026 18:56 — with GitHub Actions Error

bernardladenthin had a problem deploying to startgate July 5, 2026 19:02 — with GitHub Actions Error

Add audio sample.wav

260ddb0

bernardladenthin had a problem deploying to startgate July 5, 2026 19:50 — with GitHub Actions Error

bernardladenthin had a problem deploying to startgate July 5, 2026 19:57 — with GitHub Actions Error

bernardladenthin had a problem deploying to startgate July 5, 2026 20:08 — with GitHub Actions Error

bernardladenthin had a problem deploying to startgate July 5, 2026 20:10 — with GitHub Actions Error

bernardladenthin changed the title ~~Add Kotlin coroutines facade, Android AAR packaging, and server attach mode~~ Feature wave: Android AAR + Kotlin facade, server attach/router modes, LangChain4j streaming, GGUF tooling (llama.cpp b9878) Jul 5, 2026

bernardladenthin merged commit 8d08b37 into main Jul 5, 2026
13 of 67 checks passed

bernardladenthin deleted the claude/java-llama-cpp-features-l4tl6v branch July 5, 2026 20:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature wave: Android AAR + Kotlin facade, server attach/router modes, LangChain4j streaming, GGUF tooling (llama.cpp b9878)#298

Feature wave: Android AAR + Kotlin facade, server attach/router modes, LangChain4j streaming, GGUF tooling (llama.cpp b9878)#298
bernardladenthin merged 20 commits into
mainfrom
claude/java-llama-cpp-features-l4tl6v

bernardladenthin commented Jul 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jul 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bernardladenthin commented Jul 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Related issues / PRs

Checklist

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jul 5, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bernardladenthin commented Jul 5, 2026 •

edited

Loading