Android on-device: release-crash fix, live transcription, APK CI + download page#182
Merged
Conversation
…rt → review)
A clean, responsive web GUI that fully drives VoxTerm's engine from the browser
(desktop + phone over LAN), with a Python control backend — no reinvention of the
transcription/diarization logic, it reuses VoxTerm's own AudioCapture + transcriber +
Silero VAD + diarizer + EventLogger.
gui/server.py stdlib http.server + SSE status stream + JSON API (loopback by
default; VOXTERM_GUI_LAN=1 to reach it from a phone). CSP, nosniff,
bounded request bodies, static-dir traversal guard, capped SSE.
gui/engine.py control layer: start/stop recording via AudioCapture, background
transcribe+export job with progress, session history, artifact reads
(path-traversal guarded).
gui/transcribe.py importable transcription (WAV/buffer -> faithful events.jsonl +
-transcript.md) reusing VoxTerm's engine; progress callback for the UI.
gui/export.py the reviewed LLM-agent exporter (events.jsonl -> -agent.md + .json),
ported self-contained into the fork (+ gui/test_export.py, 23 tests).
gui/static/ polished UI (index.html/style.css/app.js): record hero w/ live level
ring + timer, model/language pickers, SSE-driven transcript view,
client-side speaker rename (flows into copy/export), session browser,
Copy-for-AI / download .md / download .json.
v1 = record → stop → transcribe (robust; reuses the tested pipeline). Verified so far
without a mic: API + static serving + traversal guards + the full load/view/export flow
against a real 53-turn session; export tests 23/23. Pending a recording-finalize (mic
contention): the record-from-GUI path and a Tauri v2 native/mobile wrapper. Live
word-streaming, party/P2P, hivemind = labeled fast-follows.
… + correctness
Review of the GUI (16 agents) found 11 real issues, all fixed + verified:
- BLOCKER: strict CSP (style-src 'self', no 'unsafe-inline') silently blocked every
element.style the UI sets (level ring, progress bar, speaker color dots) — the core
visuals. Allow 'unsafe-inline' for style-src (all interpolated values are escaped).
- MAJOR (security): LAN mode (VOXTERM_GUI_LAN=1) had zero auth — anyone on the wifi
could start a recording of the room or read past transcripts. Now requires a token
(generated/printed on start, or VOXTERM_GUI_TOKEN) on every /api/* call; loopback
stays open. Verified: no-token/bad-token -> 401, valid -> 200.
- MAJOR (perf): the transcriber/VAD/diarizer were reloaded from disk every recording.
Cache them (lock-guarded) in gui.transcribe; reset the diarizer session per run.
- MAJOR (xss): unescaped speaker rename/label in the legend innerHTML -> escapeHtml.
- MAJOR (correctness): hand-built YAML in the client export broke / allowed key
injection on a rename/peer_name with a quote or newline -> JSON.stringify scalars
(mirrors the server's _yaml_scalar).
- MAJOR (crash): Download .md/.json threw on a raw-markdown fallback session (CUR null)
-> guard the handlers.
- MINOR: dir-aware artifact resolution (same stem in two dirs returned the wrong file);
poll-thread appends under the lock + join the thread WITHOUT holding it (avoids a
deadlock) so trailing audio isn't dropped; SSE counter guarded by a lock; session-stem
escaped in the sidebar; flush startup prints so the LAN token is visible immediately.
- NIT: start_recording wraps mic-open in try/except -> structured {ok:false,error} so a
busy/missing mic shows a real message instead of a 500.
Verified: py_compile + node --check clean; export tests 23/23; CSP header correct; the
loopback load flow against a real 53-turn session; LAN 401/200/401; and a live
record -> stop -> transcribe -> export run through the engine (graceful 0-turn on a
near-silent clip). 2 review findings correctly refuted (malformed Content-Length is
cosmetic; nav a11y is enhancement).
Recording-safe hardening (built while a live mic recording ran — file-only changes): - gui/test_engine.py (16 tests): non-mic engine paths — models()/languages(), _write_wav round-trip + clipping, sessions() discovery/ordering/flags across dirs, read_artifact/_resolve text + path-traversal rejection + only_dir restriction, idle status() shape. Isolated to temp dirs; never opens the mic or a model. - gui/test_server.py (14 tests): in-process server on an ephemeral port — static serving + content-types, traversal blocked (403), /api/options|status|sessions, 404 unknown route, and the full LAN-auth contract (no-token 401 / valid 200 / wrong 401 / header 200 / TOKEN=None open; static stays open). No /api/record POST. - gui/README.md: honest docs — what it is, how to run, the phone/LAN token flow, the privacy/security model, files + outputs, v1 features + labeled fast-follows. - UX polish (static/* only, API unchanged): a11y (aria-expanded synced, aria-live on status/toast, :focus-visible rings), keyboard (Space / r toggle record, Escape + outside-click close the mobile drawer, without hijacking focused controls), a "Summarize for AI" button (copies transcript prefixed with a ready-to-paste LLM summarization task), real mic-error toasts, an empty-sessions state, and export buttons disabled until a transcript is loaded. Verified (light, recording-safe): py_compile + node --check clean; all three gui suites green (23+16+14 = 53 tests); serve smoke confirms the UI + new control load.
…se A of the roadmap) Recording-safe (file-only): turns the web GUI into an installable PWA so it lands on your phone/desktop home screen and opens instantly/offline. - manifest.webmanifest (name, standalone, theme/bg, maskable icons) + icon.svg + generated icon-192/512.png. - sw.js service worker: cache-first for the app shell, network-only for /api and SSE, versioned cache dropped on activate. Registered from app.js (CSP script-src 'self'). - server.py: serve /manifest.webmanifest + /sw.js at ROOT (root SW scope), add the .webmanifest/.png content-types, and extend CSP with manifest-src 'self' + worker-src 'self' (the strict default-src 'none' would otherwise block both). - index.html: rel=manifest, theme-color, svg icon + apple-touch-icon. Verified (recording-safe): py_compile + node --check clean; server tests 14/14; serve smoke confirms manifest (application/manifest+json), sw.js, and icons all 200 with the CSP allowances present. Roadmap + rationale live outside the repo at ~/voxterm-plans/voxterm-gui-roadmap.md. Next (needs no-recording): Tauri v2 native/mobile wrapper, live word-streaming, and the record-through-the-GUI live test.
…ack, network errors
Adversarial review of the PWA/UX code (3 confirmed) fixed:
- Stale-shell trap: cache-first never revalidated, so shipping a new app.js/style.css
without bumping the SW left installed clients on the old shell forever. Switch the
static shell to stale-while-revalidate (serve cache, refresh in background) — changed
assets are picked up on the next load with no manual cache bump.
- Offline navigation: exact-URL match meant "/?token=..." (phone/LAN mode) never matched
the cached "/", so the offline shell never loaded there, and non-"/" nav offline
returned undefined -> browser error page. Navigations are now network-first with a
fallback to caches.match("/", {ignoreSearch:true}).
- Network errors: getJSON had no catch (server down -> unhandled rejection, silent
no-op). Now catches, toasts "Network error", and returns {ok:false,error:"network"};
init() and loadSessions() default missing fields so the UI degrades cleanly.
2 findings correctly refuted (token-URL cache bloat = negligible nit; key-repeat
double-trigger = benign). Verified: node --check app.js+sw.js; engine+server tests green.
…tate
Two real, recording-safe features (built + tested while a live recording ran):
- SRT/VTT subtitle export: each transcript turn already carries audio start/end, so
export.py now adds t_offset_end per turn and renders proper .srt (HH:MM:SS,mmm,
1-indexed cues) + .vtt (WEBVTT, HH:MM:SS.mmm) — written alongside -agent.md/.json,
a --format {md,json,srt,vtt,all} CLI flag, served via engine.read_artifact (srt/vtt
kinds), and built client-side in the UI for instant "Download SRT/VTT" of any loaded
session (rename-aware). Verified on the real 28-min session (valid cues).
- Settings persistence: the GUI remembers your Model + Language in localStorage
(private-mode-safe), restoring them on load when still offered by the server.
- Loading state: a calm body.working affordance + hard-disabled record button while a
transcription job runs.
Tests: gui suites now 65 total (export 35 / engine 16 / server 14), all green;
node --check clean; serve smoke confirms t_offset_end in the API JSON + the .srt
artifact served (200).
…, peer label, clamps
Adversarial review of the SRT/VTT code (4 confirmed) + a byte-for-byte client↔server
parity check surfaced and fixed:
- Cue-text injection: a newline / blank line / "-->" inside a turn's text or label
corrupted SRT/VTT cue boundaries (could inject a fake cue). Add _cue_text() (collapse
newlines, neutralize "-->") applied to both cue text and label in to_srt/to_vtt.
- Client didn't skip empty-text turns and used the array index → blank cues + index
drift vs the server file. Client now filters empty turns with an independent counter
(mirrors the backend), so a downloaded .srt/.vtt byte-matches the server artifact.
- Peer label divergence: client used nameFor() ("Sam · laptop (peer)") vs backend
"Sam (peer)". Client cueLabel now matches the backend exactly (renames stay a
client-only delta on local speakers).
- Degenerate-span clamp: client bumped end +2.0s vs backend +0.5s; unified to +0.5s.
- build() now also clamps t_offset_end > t_offset in the JSON sidecar (was only the
rendered cue), so the documented invariant holds for out-of-order offsets.
Verified: 67 gui tests green (export 37 incl. 2 new regressions / engine 16 / server 14);
node --check clean; a verbatim Python-vs-Node parity harness on a nasty doc (peer +
empty + blank-line/"-->" injection + zero/neg span) produces byte-identical SRT.
… fix gui.export path The README predated several shipped features; bring it current: subtitle export (.srt/.vtt + byte-identical client downloads), PWA (installable/offline), settings persistence, keyboard/a11y, Summarize-for-AI; add the srt/vtt outputs + t_offset_end; correct the stale 'python -m glass.export' to 'python -m gui.export --format ...'.
Type to filter the (growing) session list by date/stem; case-insensitive; the filter survives list refreshes and shows a clean 'no match' state. Pure frontend (loadSessions now caches SESSIONS + renderSessions(query) renders a filtered view); the typing guard already prevents the Space/R record shortcut from firing in the search box.
…ing (gui/live.py) Tails the raw PCM of a WAV being recorded and transcribes each new speech window with VoxTerm's engine, printing '[mm:ss] text' as the conversation happens. Reads the FILE, not the mic, so it runs alongside any recorder with zero contention. Text-only + fw-base default for low latency. CLI: python -m gui.live ROOM.wav [--model] [--interval] [--max-seconds]. Proven on a live recording (transcribed the active conversation in near-real-time). NOT yet wired into the GUI browser UI — that's the next step (stream lines over SSE to a live transcript panel).
…ing, stream to a panel Wires gui/live.py's near-real-time transcription into the browser UI so it's actually usable, not just a CLI: - engine.py: live_start/live_stop + a background tail-transcribe thread that follows the newest in-progress recording FROM the current end (true live — no slow backlog replay), transcribes finalized speech windows with the cached fw-base engine, and appends '[mm:ss] text' lines (capped) exposed via status().live. Reads the file, not the mic, so it runs alongside any recorder with zero contention. - server.py: POST /api/live/start (optional wav, defaults to newest) + /api/live/stop. - static: a '⦿ Live transcript' toggle + a streaming, auto-scrolling live panel (calm theme, pulsing dot); applyStatus renders status().live.lines. Verified end-to-end on a real in-progress recording: start -> lines appear within ~10-16s of new speech with correct audio timestamps (e.g. [39:19]…) -> stop. server tests 14/14; node --check clean. (Browser render of the panel is wired but visually confirmable only in a browser; the data path is proven.)
…s test
- engine.delete_session(stem, dir): removes only a session's text artifacts
(-transcript/-agent.{md,json,srt,vtt}/-events.jsonl) for the stem, reusing _resolve's
traversal guard + _session_dirs/only_dir restriction; never touches .wav (audio kept).
- POST /api/session/delete (behind the LAN-auth gate).
- UI: a subtle ✕ on each session row (confirm; stopPropagation so it can't trigger open;
clears the view if the open session is deleted).
- test_engine: +6 delete tests (exact-files, traversal rejected, dir-restricted, .wav
untouched, missing-stem ok); fixed test_status_idle_shape to expect the 'live' key
added by the prior live-transcription commit.
gui suites green (export 37 / engine 22 / server 14).
The live monitor finalized speech only on silence, so an in-progress utterance showed nothing until the speaker paused. Add a partial preview of the still-growing tail: each pass re-decodes the tail and a LocalAgreement-n stabilizer commits the longest word-prefix that has agreed across the last n hypotheses (stable) and marks the remainder volatile. As words settle they graduate stable→so the head stops flickering while the tail updates live. - gui/stabilize.py: PartialStabilizer (pure, LocalAgreement-n) + 9 unit tests - engine._live_loop: re-decode tail → stabilize → status.live.partial; reset on finalize so each utterance starts clean - app.js/style.css: render the partial (committed words solid, volatile tail dimmed + softly pulsing) Proven on a real recording: ASR revised "floor"→"hood" mid-utterance and the stabilizer held it volatile until settled (never committed the wrong word). 82 tests green. Idea ported from elizaOS's streaming partial-stabilizer.
Gap #1 — live now tails the GUI's own recording. start_recording streams straight to a growing on-disk WAV (placeholder header, _poll appends s16 PCM under the lock + flush, stop patches the real header). The live monitor tails that same file, so clicking Live during a GUI recording shows your words (before, Record buffered in RAM and only wrote on stop, so Live saw nothing). Bonus: a long session no longer sits entirely in RAM; transcription loads the file off-thread. Gap #3 — small stuff: - live.py CLI: ported the LocalAgreement stabilizer (in-place updating partial line) for parity with the GUI - app.js/index.html/style.css: a scrolling live amplitude canvas during record 3 new streaming-WAV tests (header is 44B + parses, _pcm_bytes==_write_wav, growing file is tailable mid-write then finalizes valid). 85 tests green.
Found by a verifying multi-agent audit of the new streaming-record/live code. Concurrency: - stop_recording now stops the live monitor (live is bound to the recording's lifetime) — was leaking a daemon thread that re-decoded the finalized file forever and raced the post-stop job. - live monitor uses a DEDICATED transcriber (_get_engines dedicated="live") so it never shares CTranslate2 decode state (or the dedup buffer) with the batch job — CT2 isn't safe for concurrent decode on one instance. - live_start/live_stop track real thread liveness (no double live loop after a timed-out join); status() snapshots self._live under the lock. Correctness: - stop_recording surfaces a header-patch I/O failure as an error instead of silently transcribing a zero-data WAV into a spurious empty session. Security (server.py): - CSRF: reject cross-origin state-changing POSTs (Sec-Fetch-Site / Origin vs Host). - DNS-rebinding: loopback Host-header allowlist (blocks a rebinding site driving the tokenless local API). - Clickjacking: CSP frame-ancestors 'none' + X-Frame-Options: DENY. - _authed compares on UTF-8 bytes so a non-ASCII token yields 401, not a crash. +3 security regression tests (host allowlist, CSRF, non-ASCII token). 88 green.
VoxTerm splits turns on VAD silence alone, so a natural pause after "and…" or "the…" wrongly ends a turn mid-sentence. Add a zero-model end-of-turn signal (gui/eot.py): P(turn complete) from grammar cues — terminal punctuation 0.95, trailing conjunction 0.15, trailing article/preposition 0.20, short 0.70, else 0.50. The live loop now merges a finalized fragment into the previous line when that line ended mid-clause (live view is text-only, so no speaker boundary to cross), giving readable sentences instead of choppy breath-split lines. 9 unit tests; 97 green. Idea ported from elizaOS's HeuristicEotClassifier. (Diarization hardening + windowed-live ports were checked against VoxTerm's code and found redundant — VoxTerm already gates centroid updates by cosine sim and bounds the live buffer via VAD — so they were intentionally skipped.)
…ucture) From the Android-plan UI critique — the safe, additive set (deferring layout moves like the bottom record bar until we can render on a device): - delete ✕ was opacity:0-until-hover → invisible on touch; show it on @media (hover: none) - waveform canvas was a fixed 600px bitmap stretched by CSS → blurry on phones/retina; size the bitmap to CSS px × devicePixelRatio, draw in CSS px - --faint #6b7280 (~3.9:1, failed WCAG AA) → #7d8694 (~4.6:1) - mobile: safe-area insets (notch/gesture bar) on main/sidebar/nav/toast + 44px min tap targets on btn/select/ghost/✕/legend - honor prefers-reduced-motion (kills the looping pulses/level animations) 97 tests green; JS validates.
v1 Android app = the existing web UI in a native shell, talking to the VoxTerm backend on your desktop over the LAN. The phone does NO transcription. - src-tauri/: Tauri v2 host crate (identifier site.nubs.voxterm, frontendDist → ../mobile-pair, window "main", android minSdk 24) + gen/android/ gradle project - mobile-pair/index.html: on-theme pairing page — enter desktop host/port/token (prefilled from localStorage), navigates the webview to http://host:port/?token=… where the desktop serves UI+API+SSE from one origin, so app.js works unchanged (reads the token from the query string) - AndroidManifest: INTERNET permission ONLY — no RECORD_AUDIO, no camera, no location. The app structurally cannot record you; the desktop owns the mic. usesCleartextTraffic=true for the LAN http backend (token is the gate). Scaffold only — not yet built. Lives on feat/gui, no PR.
scripts/android-dev.sh — plug in a phone (or --emulator) and it self-heals the toolchain (rust targets), builds the APK, installs, launches, and asserts the app is alive. Test traffic stays on loopback via `adb reverse tcp:8740` — never touches Wi-Fi. Stages A–F with structured exit codes (10 toolchain/11 targets/ 20 device/30 build/40 install/50 launch/60 smoke). Hard gates: build, install, launch, render-not-blank (scripts/assert_screen.py, Pillow luminance check). Soft for v1: the backend round-trip (depends on the in-app connect flow). Supporting bits: - scripts/mock_backend.py — torch-free stdlib stand-in (serves gui/static + canned /api + heartbeat SSE, logs requests) for fast offline CI runs (--mock) - gui/server.py — opt-in request logging via VOXTERM_GUI_LOG=1 (silent by default) so the smoke test can assert GET /api/options + /api/events - mobile-pair: auto-connect if a backend answers on the device's localhost (the adb-reverse/dev case) — fails fast on a real phone → pairing form stays 97 tests green. Quickstart: scripts/android-dev.sh --emulator --debug --mock (offline) or scripts/android-dev.sh --debug (real phone, real engine).
… Silicon A verifying cross-platform audit found the GUI dead on Apple Silicon (the flagship target) and the android script broken on every mac. All fixes are Linux-safe (97 tests still green) and standard per-platform branching: GUI (HIGH — Apple Silicon had an empty model dropdown + KeyErrors): - Engine.models() falls back to AVAILABLE_MODELS when FASTER_WHISPER_MODELS is empty (Apple Silicon) so the dropdown is never blank - new CPU-aware default (transcribe.gui_default_model / Engine.default_model): prefer fw-small where faster-whisper exists, fall back to MLX only on Apple Silicon — and crucially NOT raw config.DEFAULT_MODEL, which is qwen3-0.6b when qwen-asr is installed (too slow on CPU). Fixes the live + post-stop KeyErrors. - /api/options exposes default_model; app.js pre-selects it (no more fw-small) scripts/android-dev.sh (broke on all mac): - ANDROID_HOME / JAVA_HOME per-OS (mac ~/Library/Android/sdk, Studio JBR / java_home) - resolve python3 (mac has no bare `python`); arm64-v8a AVD + -gpu host on Apple Silicon audio/capture.py: actionable mac mic-permission error (TCC not granted) gui/export.py: per-platform live-dir fallback (was Linux XDG only) Full report: ~/voxterm-plans/mac-compat-report.md
Zero-regression hardening from the cross-platform audit (99 tests green): - 0.1 decouple headless ASR from the Textual TUI: gui/transcribe.py imported tui.app (pulling textual+sounddevice into every server/headless import). Extracted the pure split into tui/text_split.py; tui.app delegates to it. Verified: importing gui.server no longer loads `textual`. - 0.3/0.4 gui/server._read_json: a malformed Content-Length raised an uncaught ValueError out of the POST handlers — guard it; also close the connection on an oversized body (no undrained body / latent HTTP desync). - 0.5 live-state writes now take self._lock (brief dict mutations only, never around transcribe/VAD) to match the locked reader in status() — the "consistent snapshot" comment is now actually true. - 0.6 mobile-pair: the loopback auto-probe honors the port field (was hardcoded 8740). - 0.7 export.py docstring: `glass.export` -> `gui.export` (no glass pkg). - 0.9 Android cleartext: documented that app-wide cleartext is INTENTIONAL for the LAN thin client (can't scope arbitrary RFC1918 IPs declaratively; the token + LAN is the trust model) — kept on for release on purpose. - 0.11 drop the cosmetic SSE `Connection: keep-alive` header (HTTP/1.0). - 0.10 + 0.8: capture.py macOS mic-permission tests; commit src-tauri/Cargo.lock.
A new, 100%-optional CPU streaming-ASR tier that runs everywhere VoxTerm does (Linux, macOS arm64, Windows) with no GPU. Verified end-to-end on this Linux/CPU box: installs clean, decodes correctly, and does NOT disturb VoxTerm's pinned onnxruntime (sherpa statically links its own ORT). - pyproject: `[project.optional-dependencies] streaming = ["sherpa-onnx..."]` (marker excludes Intel-macOS — no wheel). NOT a core dep. - config.py: one DRY gate after the platform branches — surfaces the `sherpa-stream-en` model key + SHERPA_MODELS ONLY when sherpa-onnx is importable AND a wheel exists for the platform. Absent → byte-for-byte unchanged. - audio/transcriber.py: SherpaStreamingTranscriber (lazy import w/ clear error; downloads the 20M streaming-zipformer on first load; per-call create_stream so it's a drop-in for the existing chunked callers; same RMS/hallucination/dedup filters; ALL-CAPS model output → sentence-case). Factory dispatch added before the Whisper fallback. - gui/test_sherpa.py: skip-guarded (no-op without the extra) — gating consistency, factory dispatch, RMS short-circuit. Zero-regression: without the [streaming] extra installed, nothing changes for any existing user. 102 tests green (99 + 3, the new ones skip when sherpa is absent). Follow-on (noted, not yet done): a true-streaming live-loop path (persistent OnlineStream + endpoint finalize) so the GUI live view streams word-by-word.
…needs a Mac)
iOS reuses the existing Tauri thin-client (mobile-pair → LAN desktop, INTERNET/no-mic).
Everything here is cross-platform + lint-clean on Linux; the actual init/build/sign/run
loop requires a Mac + Xcode (cannot build off a Mac).
- src-tauri/Info.ios.plist: NSAllowsLocalNetworking (minimal ATS for LAN http, NOT
arbitrary loads) + NSLocalNetworkUsageDescription (iOS-14 local-network prompt).
- tauri.conf.json: additive bundle.iOS { minimumSystemVersion "14.0" }.
- scripts/ios-dev.sh: Darwin-guarded (clean no-op off-Mac); adds iOS rust targets,
`cargo tauri ios init` once, then ios dev|build.
- src-tauri/.gitignore: ignore generated /gen/apple/ build artifacts.
- docs/ios-thinclient.md: build path, the two plist keys, signing, pairing.
Zero-regression: no Python touched; Android (gen/android, manifest) byte-for-byte
unaffected; bundle.iOS + Info.ios.plist are read only by the iOS bundle target.
102 tests green.
…sherpa) The live monitor now prefers the sherpa streaming backend when it's installed (opt-in) and drives it as a true streaming recognizer instead of chunked VAD windows: - _live_loop split into setup/dispatch + two paths. The chunked path (_live_chunk_loop) is the original code VERBATIM — fw-*/MLX/qwen3/parakeet and any non-sherpa backend behave byte-for-byte as before (zero regression). - _live_stream_loop: one persistent OnlineStream fed the tailed PCM; the running decode is published as the volatile partial each ~1s; sherpa's endpoint detection (or the 20s cap) finalizes a line. Same self._lock discipline. - live model preference: sherpa-stream-en (if installed) → fw-base → platform default. Only changes behavior when the optional [streaming] extra is present. Verified: streaming primitives grow the partial incrementally + decode correctly on this box; 102 tests green (chunked path unchanged).
Adversarial QA of the new code + a real KVM-emulator run surfaced these (all fixed): - transcriber: _ensure_sherpa_model is now ATOMIC (extract to staging → rename) with a complete-model guard (all 4 artifacts) so an interrupted extraction self-heals instead of a permanent StopIteration; load() uses a _pick() helper that raises a clear RuntimeError naming the missing file; .part download cleaned up on failure. - transcriber: SherpaStreamingTranscriber.is_loaded is now a @Property, matching every other backend (was a method — would mis-read as loaded via getattr). - engine: the streaming live path now applies the hallucination + dedup filters on finalized lines, like the chunked/batch backends. - android-dev.sh: launch the CORRECT component — debug builds install site.nubs.voxterm.debug, and the activity class keeps the base namespace, so the launch is <appId.debug>/<base>.MainActivity (the emulator caught the old site.nubs.voxterm/.MainActivity → "activity did not report Status: ok", exit 50). - android-dev.sh: validate $PYTHON is actually runnable (clean exit 10, not a late fail). - assert_screen.py: exit 3 = SKIP when Pillow is absent (macOS) so the render gate isn't a silent pass; android-dev.sh treats exit 3 as a soft skip. 102 tests green. (Low/cosmetic, left + noted: streaming line-start timestamp drift; the loopback auto-probe's cross-origin read is best-effort and degrades to manual pairing.)
…low 14) Use get_flattened_data when available, fall back to getdata — no behavior change, silences the Pillow-14 DeprecationWarning the emulator run surfaced.
- audio/transcriber.py: generalized the sherpa model registry (repo→URL map) so multiple sherpa transducer models share one SherpaStreamingTranscriber. - config.py: new optional gated key `sherpa-nemotron-en` (NeMo FastConformer-RNNT 0.6B, exported for sherpa-onnx). Same find_spec gate → zero-regression when the [streaming] extra is absent. - scripts/bench_asr.py: reproducible WER (word edit-distance, normalized) + CPU RTF benchmark across backends. - docs/streaming-asr-benchmark.md: results + honest analysis. Numbers (Linux CPU, 3 labeled clips): fw-small 2.1% WER / 0.64 RTF (batch, og default); fw-base 5.1% / 0.18; sherpa-nemotron-en 4.4% / 0.25 (streaming sweet spot — near-fw-base accuracy, ~4x real-time, native streaming); sherpa-stream-en zipformer-20M 20.9% / 0.064 (~16x real-time but inaccurate). nemotron-EN proven to load + decode via the same backend. 102 tests green (test_sherpa now covers both gated keys, skips without the extra).
Engine.models() returned only FASTER_WHISPER_MODELS on Linux/Intel/Windows, so the optional sherpa-stream-en / sherpa-nemotron-en keys (present in AVAILABLE_MODELS but not the fw set) never appeared in the GUI model dropdown. Union the platform's base set with SHERPA_MODELS so they're selectable wherever installed. Found by rendering the GUI headless. test_models_returns_only_fw_keys -> test_models_are_valid_keys (valid-keys invariant incl. the additive sherpa keys).
scripts/gui_e2e.py boots gui.server, drives headless Chrome via the DevTools Protocol, and asserts the real browser flow: model dropdown + session list populate from the API, and clicking a past session loads + renders its transcript (with a screenshot). Covers the browser path unit tests can't — only record-with-a-mic still needs hardware. websocket-client is a dev-only dep. Verified: dropdown includes the optional sherpa keys, 4 sessions, transcript renders end-to-end.
docs/streaming-asr.md: install the optional [streaming] extra, the two model keys (sherpa-stream-en / sherpa-nemotron-en), GUI/CLI usage, how it works, and the zero-regression/opt-in posture. gui/README 'Models' section now points to it + the benchmark. Makes the streaming feature discoverable + usable (upstream-ready).
…fix) 45-agent audit of this session's additions. Confirmed fixes: - security: gate /api/* GETs with the same-origin check too (not just POST); strip the ?token= from the URL after read (history.replaceState); cap each SSE stream at 10 min so an abandoned silent client can't hold a slot. - desktop UX: the pairing page starts behind a 'Connecting…' loader and only reveals the phone form when no local engine answers (held under Tauri) — so the desktop app no longer flashes the phone pairing form during engine startup. - bug: applyStatus() guarded s.job before deref; sherpa live dedup state reset between sessions; TUI 'g' no longer spawns duplicate engines. - cleanup: removed dead/contradictory Pillow branch in assert_screen; bench no longer re-decodes WAVs for the total; transcribe cleanup logs a file-close failure instead of swallowing it; host input inputmode url; CSP connect-src tightened; backend seam fallback simplified; Tauri externalBin/freeze documented. 99 gui tests + browser e2e + Tauri build all green. Fork only.
…live engine From the 4-agent parity audit (verdict: GUI cleanly reuses the TUI engine; these are the non-by-design divergences worth fixing): - F1/F2 (export drift): delete the client-side JS formatter fork (buildMarkdown/ buildSrt/buildVtt/buildJson + helpers, ~80 lines) — it had silently desynced from export.py (the downloaded .md was missing 8 front-matter fields). New POST /api/export renders server-side via export.py (the single formatter), rebuilding from the events log and applying the client's speaker renames. Verified: a no-rename .md byte-matches the on-disk -agent.md. - F3: models() offered FASTER_WHISPER_MODELS and silently hid installed qwen3 on Linux (short-circuit) — now offers the full AVAILABLE_MODELS, matching the TUI. - F8: the live-monitor fallback used config.DEFAULT_MODEL (the CPU-unusable qwen3) — now uses the CPU-aware gui_default_model(). - F10: the GUI live path reached into transcriber underscore-privates — added a public surface (recognizer / reset_dedup / is_duplicate / is_hallucination) and use it. - F4: documented the intentional TUI-scope gaps (no P2P/hivemind/system-audio, manual rename vs the cross-session speaker DB) in gui/README.md. 103 gui+sherpa tests (incl. a new export_session test) + browser e2e + /api/export HTTP check all green. Fork only.
… (kills the relay) The phone transcribes locally — no pairing, no relay, no network. Builds to a green APK here (runtime needs a device/emulator-with-mic). - tauri-plugin-voxasr: Tauri 2 Android plugin. Kotlin VoxasrPlugin reads the mic (AudioRecord, 16kHz mono PCM16) and streams it through a sherpa-onnx OnlineRecognizer with endpoint detection, emitting partial/final events. The 20M int8 zipformer is bundled in assets + staged to filesDir on first use, so it's fully offline (no first-run download). RECORD_AUDIO only; no INTERNET added. - Rust shim exposes start_transcribe/stop_transcribe (Android-only; desktop/iOS get a clean 'unsupported' stub so it's a plain, CLI-discoverable dep). - sherpa-onnx 1.13.2 Android AAR (static-link ORT, all ABIs), version-matched to the desktop engine. AAR + model gitignored; fetch-deps.sh stages them. - App: registers the plugin on #[cfg(mobile)]. VERIFIED: cargo tauri android build --apk (aarch64) → green APK; the Kotlin compiled against the sherpa AAR (proves the OnlineRecognizer/AudioRecord API usage); APK contains lib/arm64-v8a/libsherpa-onnx-jni.so + assets/voxterm-model (int8) + VoxasrPlugin in classes4.dex + RECORD_AUDIO merged. Fork only.
…ive captions Makes the on-device engine a usable feature: the Android app now offers "Transcribe on this device" (fully offline) with live partial/final captions, pairing kept as the browser fallback. - mobile-pair: on-device mode (Start/Stop + captions) shown when the native plugin is present; revealMobileHome() picks on-device on the app, pairing in a plain browser. - pair.js invokes plugin:voxasr|start_transcribe/stop and listens for the plugin's partial/final/error events via addPluginListener. - withGlobalTauri + a CSP ipc: allowance let the vanilla webview reach the plugin; capabilities/mobile.json grants voxasr:default. - plugin: dropped the unused StartArgs (model is bundled), so the command takes no args. VERIFIED: cargo tauri android build --apk (aarch64) → green APK (177M); the updated frontend re-embedded in libapp_lib.so; plugin:voxasr registered (144 refs in .so); sherpa .so + offline model still bundled. Runtime (mic→captions) needs a device. Fork only.
…for device debugging The on-device error sink was #err, which lives in the hidden #pairform — so a start_transcribe failure was silent. Added a visible #odErr in the on-device panel and a console.error (inspectable via chrome://inspect on a device). No behavior change to the happy path. aarch64 APK rebuilds green with sherpa lib + offline model bundled. Fork only.
…self-test proof Re-architected the live path from plugin events to polling — addPluginListener needs a registerListener permission a hand-written plugin doesn't generate (the prior 'Start' bug). Now uses only permitted commands; no listener wall. - Kotlin: AudioRecord -> sherpa OnlineRecognizer accumulates finals + partial; pollTranscript command returns/clears them. Debug self-test decodes a bundled clip on load (proves decoding). - Rust: poll_transcript command (+ start/stop); run_mobile_plugin uses serde_json::Value so the resolve round-trips cleanly. serde_json dep added. - JS (pair.js): on Start, poll plugin:voxasr|poll_transcript every 500ms, render finals + partial; clear on Stop. Visible #odErr for on-device errors. - permissions/default.toml: allow-poll-transcript; build.rs COMMANDS += poll_transcript. VERIFIED on the x86_64 emulator: self-test decoded the bundled clip to the correct text (model loads + decodes on-device, no network); tapping Start flips to Stop with AudioRecord live (green mic dot) and pollTranscript firing every 500ms, no errors. Actual mic->captions needs a real device (emulator has no mic); decoding + capture + poll wiring all proven. Fork only.
…ening - /api/audio serves the session WAV with HTTP Range/206 (media-src added to the CSP so the <audio> element can load; 416 routed through _hdr; do_HEAD 405; Content-Length on JSON/static). The engine hardlinks <stem>-gui.wav at transcribe time so playback maps to the exact recording. - CPU-aware transcriber load(): explicit int8 + cpu_threads + greedy beam_size=1 + a warm dummy decode. The GUI defaults to fw-base via gui_default_model(), and the engine warms the model at server start. - "Detect speakers" diarize flag threaded through stop_recording -> transcribe. - start_recording tolerates a malformed device value and reverts to the OS default input when "System default" is re-selected (no sticky global). - _session_title keeps short first utterances (>= 2 chars) so titles aren't dates.
…iew recording - Rebuild the UI as a monochrome (no accent hue) document-style transcript with a sticky record dock, a settings popover, and an export menu. The record dot is the only color. Inline <audio> playback (click a timestamp to seek) plus a Download-WAV action. - Recording shows a level meter + "Recording..." state and the accurate, diarized transcript appears on stop -- one model, no streaming preview to reconcile against the final result. - Robustness: title derives from the transcript (no raw-date headings); same-speaker turns keep a clickable timestamp instead of an orphaned box; the player pauses when leaving a transcript and its probe is session-tokened; seek waits for audio metadata; the record button has a single owner; init() surfaces an unreachable server. - a11y: real keyboard focus ring on menu items, aria-live progress, readable muted text. PWA shell cache bumped; manifest/theme colors aligned. Docs updated.
…e2e for the redesign - <audio preload="metadata"> so the seek bar shows the clip length immediately on load instead of a misleading 0:00/0:00 (cheap for a local same-origin WAV; the probe still defers a cold seek to loadedmetadata). - Rewrite scripts/gui_e2e.py for the redesigned UI and add the checks unit tests can't cover: transcript-derived title (not a raw date), the recording's audio actually LOADING UNDER THE PAGE CSP (a fresh Audio() obeys media-src like the inline player), the visible player's real duration, and a record->stop cycle, with a securitypolicyviolation collector asserting zero violations. Verified in headless Chrome: audio loadedmetadata, duration 14.66s, 0 CSP violations.
The TUI records system audio (macOS ScreenCaptureKit, Linux parec) and mixes it with the mic; the GUI was mic-only. Add an "Audio source" selector (Microphone / System audio / Mic + system) in the settings popover, threaded through /api/record/start -> Engine.start_recording(source=...). system/both reuse the engine's existing SystemCapture; "both" mixes via the same time-aligned add the TUI uses (_mix_chunks). Fails gracefully with a clear message when the platform tool is missing (e.g. parec not installed); selection persists in localStorage. Tests: gui/test_capture_source.py (mix overlap+tails+clip, source wiring with the capture classes mocked). Windows stays unavailable (no engine system-audio there).
The TUI's "U" action runs a local-LLM summary (MLX on Apple Silicon, or an ollama:<model> backend anywhere); the GUI only had "Summarize for AI" (copies a prompt for an external model). Add "Summarize with local LLM": POST /api/summarize -> Engine.summarize_session() reuses the session transcript + the TUI's own summarizer.engine (get_summarizer/resolve_template), shows the result in a dismissible panel above the transcript, and surfaces a clear message (never a crash) when no backend is available. A "Summary model" settings field (persisted) lets non-Mac users point at an Ollama model. Tests: gui/test_summarize.py (ok / no-transcript / graceful no-backend / path traversal, summarizer mocked). 112 gui tests + headless e2e green.
…or guard Extend the headless-Chrome e2e to exercise the new local-LLM summarize action (asserts it fails GRACEFULLY with no backend present — no crash, block hidden), confirm the audio-source selector offers mic/system/both, and collect window.onerror + unhandledrejection so the run fails on ANY uncaught JS error anywhere in the flow. Verified locally: summarize graceful, source options correct, 0 CSP violations, 0 uncaught JS errors.
…n cleanup) From a code-quality deep scan (one function per purpose, no dead code, no passthrough params): - _mix_chunks: collapsed the gui/engine.py copy and tui/app.py's staticmethod into one audio/mix.py::mix_chunks — both call it; the TUI staticmethod is gone (not replaced with a wrapper). - _fmt_hms: was duplicated in gui/transcribe.py (truncating) vs gui/export.py (rounding) → ±1s live-vs-export drift. One gui/_timefmt.py::fmt_hms (rounding), used by transcribe/export/engine; also dropped the non-essential _fmt_hms parameter the live loops threaded around. - _write_wav: dead production code (recording uses _wav_header + _pcm_bytes) — deleted; its tests folded into the _pcm_bytes encoder test; unused `wave` import removed. - app.js: one copyOrDownload() helper (copyForAI + summarizeForAI shared the clipboard-or-download fallback); a named PEER_COLOR const instead of a bare hex that aliased a rotating speaker slot. Full suite 523 passed.
Verified by a green debug APK build (cargo tauri android build --debug --apk): - Model staging is now atomic + complete: verify ALL 4 required files (was only tokens.txt) and stage into a .tmp dir then renameTo() the final dir, so a mid-copy process kill can't wedge a half-populated voxterm-model dir. - Guard AudioRecord init: bail (with lastError) when getMinBufferSize() returns <= 0 (minBuf*2 would throw) and when state != STATE_INITIALIZED (mic busy). - Never leak the native sherpa OnlineStream on a failed start — track it in a nullable and release it in finally; mark `recognizer` @volatile (built lazily from both the mic worker and the debug self-test thread); reset `running` on every exit path.
Swap the bundled offline model from the 20M zipformer (2023-02-17) to the 70M streaming zipformer2 (2023-06-26). On the bundled test clip the 20M dropped the opening clause and garbled "brothels"; the 70M transcribes it in full and correct. On a real phone it decodes at xRT 0.09 (0.62s for 7.13s of audio, ~11x real-time), so the accuracy gain costs no latency. APK grows ~26 MB (encoder int8 67 MB vs 40 MB). The 70M model is model_type=zipformer2, which has no `attention_dims` metadata, so the hardcoded modelType="zipformer" failed to init the encoder. Set modelType="" to auto-detect the architecture from the model's own ONNX metadata, so fetch-deps.sh is the single source of truth for the bundled model and no architecture string has to stay in sync here. Also log a measured xRT in the debug self-test, so on-device latency is a real number rather than an assumption.
Select the bundled offline model with VOXASR_MODEL:
zipformer-70m (default) streaming zipformer2, ~68 MB assets / ~232 MB APK
fast (xRT 0.09 on a real phone), ALL-CAPS, no punctuation
nemotron-0.6b NeMo FastConformer-RNNT, ~632 MB assets / ~621 MB APK
accurate, native casing + punctuation, xRT 0.29 on the same phone
The default stays the lightweight zipformer so a plain build is small and
installs anywhere; nemotron is opt-in for builds that want transcript-grade
output and can afford the size. The Kotlin plugin already auto-detects the
architecture and feature dim from each model's ONNX metadata (modelType="",
metadata-driven feat_dim), so the tier swap needs no code change.
Also replace the fragile hardcoded epoch-specific cp filenames with a glob
that matches both naming schemes (zipformer's
`encoder-epoch-…-chunk-16-left-128.int8.onnx` and nemotron's plain
`encoder.int8.onnx`), mirroring the desktop loader's _pick(); add a guard
for an unknown VOXASR_MODEL. shellcheck-clean.
Both tiers verified end-to-end on a real device: bundled-clip self-test
decodes correctly and the live start/poll/stop pipeline runs without error.
…fecycle start_transcribe used to reject with "microphone permission not granted" when the runtime permission was absent, so a fresh install's first Start hard-failed with no in-app recovery. The plugin now owns the mic: it declares the RECORD_AUDIO "microphone" alias and requests it on first Start, resuming in a @PermissionCallback once granted (verified on a device: fresh install -> system prompt -> grant -> records). Lifecycle hardening while here: - ensureRecognizer(): one @synchronized lazy builder shared by the mic worker and the debug self-test, closing a check-then-act race that could build and leak two native recognizers, and removing the duplicated idiom. - a per-session generation token so a worker that outlives stop's 2s join can neither run the mic alongside nor reset the running flag of a newer session. - stop_transcribe clears the trailing partial so poll_transcript stops returning a never-finalized line after recording ends. - the webview clears the transcript on each Start (no cross-session concat / unbounded DOM growth).
…nused dep The plugin's android/.tauri/tauri-api/ tree is the Tauri-CLI-generated mirror of the tauri-android framework (Apache/MIT "Tauri Programme", ~2150 LOC incl. 2+2 scaffold tests) — vendored upstream code, not part of this contribution. The build resolves :tauri-android from the gen settings path and never uses this copy (verified: a clean build with the directory removed still produces the APK), and the sibling src-tauri/gen/android already gitignores its own /.tauri. Gitignore android/.tauri/ and untrack the 29 files so the diff is the plugin. Also drop the unused direct `serde` dependency (the crate uses only serde_json::Value).
…self-heal deps - add tauri-plugin-voxasr/README.md (purpose, the start/stop/poll command surface, fetch-deps.sh + VOXASR_MODEL tiers, RECORD_AUDIO/no-INTERNET stance, build via scripts/android-dev.sh) plus a short subsection + CHANGELOG entry in the main docs. - fix the lib.rs crate docstring: it described a voxasr://partial/final event contract that does not exist — the plugin is poll-only (poll_transcript). - update capabilities/mobile.json's description: it still said "window/webview/ event only ... pairs to a desktop" though it now grants voxasr:default and on-device is the primary mode. - android-dev.sh runs fetch-deps.sh when the AAR/model are missing, so the advertised one-command build works on a fresh checkout (honors VOXASR_MODEL). - fix a stale revealForm() reference in an index.html comment (revealMobileHome).
…e-at-stop)
Replace the streaming zipformer/nemotron path with offline Whisper: the mic is
buffered while recording and, at stop, the whole clip is decoded by a sherpa-onnx
OfflineRecognizer — full context, native punctuation + casing, no rough live
output. This is the same model family the desktop's faster-whisper uses, so the
phone gets transcript-grade results.
- fetch-deps.sh: VOXASR_MODEL tiers are now whisper-tiny/base/small.en (base.en
default ~154 MB); Whisper has no joiner, so stage encoder/decoder/tokens only,
and wipe the model dir first so a tier switch leaves no stale files.
- VoxasrPlugin.kt: OfflineRecognizer (modelType="whisper", en/transcribe); record
to a PCM buffer; at stop, split into <=30 s windows (cut at the quietest point
near the boundary so words aren't sliced) and join. poll_transcript now reports
{ phase, elapsed, level, durationSec, segments[], error? }. Keeps the runtime
RECORD_AUDIO request, generation guard, and @synchronized recognizer build; the
stop path snapshots the take's buffer and joins a prior worker before reopening
the single-owner mic.
- measured on a real phone: base.en self-test xRT ~0.2 (~5x real-time), correct
punctuated transcript.
The phone now runs the SAME web GUI as the desktop instead of a separate stripped page. gui/static is staged into the mobile bundle (mobile-pair/app/) and a LocalBackend drives the native voxasr plugin + localStorage instead of the desktop's Python HTTP engine — same look, same record→transcribe→view→export flow. - gui/static/backend-local.js: implements the window.VOX_BACKEND seam (getJSON/events/authUrl) against the plugin; synthesizes app.js's recording→transcribing→done state machine from poll_transcript; persists sessions + renders client-side md/json/srt/vtt export. Sets the `on-device` flag so app.js/CSS hide Python-only features (model/source/mic/diarize/summary, language, local-LLM summary, WAV download, speaker rename) — no dead buttons. - scripts/stage-mobile.sh (+ tauri beforeBuildCommand/beforeDevCommand): copies gui/static into mobile-pair/app/ with backend-local.js swapped in and the PWA shell dropped; mobile-pair/app/ is gitignored (gui/static stays the source). - mobile-pair: the Android app redirects to the on-device GUI; the pairing form is now browser-only (dead loopback probe removed). - AndroidManifest strips INTERNET (tools:node=remove) → the APK is provably offline; CSP trimmed to match (no remote/blob tokens). - app.js/style.css: two small on-device guards; empty-state copy fixed (both platforms transcribe at stop, not live). Verified e2e on a real phone: GUI loads, degrade applied, two record→transcribe takes complete cleanly, zero console errors, only RECORD_AUDIO granted.
…ngine Update the plugin README (offline Whisper, the phase-based poll contract, whisper model tiers, the unified-GUI architecture), the main README's Android section, and the CHANGELOG entry — the previous text described the superseded streaming model.
sherpa-onnx Whisper truncates anything ≥30 s ("process only the first 30 s and
discard the remaining"), so an exactly-30 s window risks a boundary warning. Cap
the windows at 29 s — comfortably under the limit, no data discarded, plus a
margin for the silence-aware cut. Verified the chunked decode of a synthetic
>30 s clip joins into coherent text with the cut landing in a pause.
…g it stagedModelDir() copied the bundled assets into a temp dir and swapped it in without checking the required model files actually landed — a build shipping incomplete assets would surface as a cryptic native recognizer crash later rather than a clear error. Verify all required files are present before the atomic rename (clear IOException otherwise), check renameTo's result, and @synchronized it so the debug self-test can't race a first record into staging. Verified on a physical device (debug APK): a cold re-stage after `pm clear` stages all files and the offline self-test decodes test.wav correctly (xRT 0.18, full casing + punctuation).
The comment advertised 'zipformer-70m default | nemotron-0.6b', but fetch-deps.sh
only accepts whisper-{tiny,base,small}.en (default whisper-base.en) and exit 1s on
anything else — so copying the old hint sent users straight into a script abort.
…PK CI
- fix(android): keep com.k2fsa.sherpa.onnx classes/members under R8. Minified
release builds stripped config fields read only via JNI (decodingMethod, ...),
so the recognizer crashed at stop with "failed to get field id for
decodingMethod". Adds keep rules to the app proguard-rules.pro.
- feat(android): live transcription preview. The voxasr plugin now decodes the
growing buffer during recording (finalized <=29s windows + a volatile partial)
and exposes it via pollTranscript; backend-local.js maps it into app.js's
existing live view. The authoritative full pass still runs once at stop.
- fix(android): brace ${MODEL} in fetch-deps.sh so the multibyte ellipsis after
it doesn't trip bash's set -u under a UTF-8 locale (broke local builds and
would break CI).
- ci(android): add .github/workflows/android-release.yml — build + sign an arm64
APK on mobile-path changes (or manual dispatch) and publish it to a rolling
android-latest release.
- docs(android): add docs/android-install.md + a README install pointer.
A self-contained static page (docs/index.html, served via Pages from /docs) with a Download APK button (-> the android-latest release asset), sideload steps, an Obtainium auto-update section, and requirements. Adds .nojekyll so the static page is served as-is.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ships the on-device Android app to
main, working and installable.New in this branch (on top of #175)
com.k2fsa.sherpa.onnxconfig fields read only via JNI (decodingMethod, …),so the recognizer died at stop with "failed to get field id for decodingMethod."
Added ProGuard keep rules. (PR feat: VoxTerm as a standalone Tauri app — one GUI on desktop + on-device mobile #175 alone still has this bug.)
buffer during recording (finalized windows + a volatile partial) and exposes it
via
pollTranscript;backend-local.jsfeeds app.js's existing live view. Theauthoritative full pass still runs once at stop.
fetch-deps.sh— brace${MODEL}so the trailing multibyteellipsis doesn't trip
set -uunder UTF-8 (broke local + would break CI).android-release.yml— build + sign an arm64 APK on mobile-pathchanges / manual dispatch, publish to a rolling
android-latestrelease.docs/android-install.mdinstall guide + a GitHub Pageslanding page (
docs/index.html) + README pointer.After merge
first
android-latestAPK. Until theANDROID_*signing secrets are set it'sdebug-signed (installable, not update-stable) — see
docs/android-install.md.main/docsto serve the download page.