blokzdev · blokzdev · Jun 2, 2026 · Jun 2, 2026
diff --git a/docs/BACKLOG.md b/docs/BACKLOG.md
@@ -8,6 +8,9 @@
 _(nothing active — pick the next batch from below)_
 
 ## Deferred / future refinements
+- [ ] **OCR — non-Latin scripts.** P13b-1 ships the **bundled Latin** ML Kit recognizer (no Google Play
+      Services, offline). Chinese/Japanese/Korean/Devanagari need their own ML Kit script models (extra APK
+      size or a download). Add a script choice if users want non-Latin OCR. *(From P13b-1.)*
 - [ ] **Auto-summarize — queue-decoupled background run.** P13a-2 generates the auto-summary **inline** in
       `_persistCompleted` before the next download pumps (gated on "model present" so it can't stall on a
       fetch), exactly like `autoTranscribe`. Generation is heavier than whisper-tiny, so a fuller design

diff --git a/docs/SPEC.md b/docs/SPEC.md
@@ -37,6 +37,7 @@ Implementation-level detail. Versions are targets to confirm at scaffold time
 | **On-device AI (P10b-2+):** `flutter_gemma` (MediaPipe/LiteRT-LM — embeddings + LLM + RAG; added P10b-2 embedder-only), a whisper.cpp pkg (`whisper_ggml_plus`/`whisper_kit`), ML Kit (OCR/translate) | On-device/edge AI. See `docs/AI-SPEC.md`. |
 | **On-device AI (P12c):** `unorm_dart` (pure-Dart Unicode NFKC); `onnxruntime_v2` (P12c-2; 16KB-page + GPU) | NFKC for the hand-rolled XLM-R tokenizer (the multilingual embedder); ONNX runtime for MiniLM. |
 | **On-device AI (P12e):** `whisper_ggml_plus` (whisper.cpp FFI, MIT) | Speech transcription. Chosen over `whisper_kit` for Windows/v2 parity; app-managed ggml model fed as `modelPath`, 16 kHz WAV made with our existing ffmpeg (no ffmpeg companion). |
+| **On-device AI (P13b-1):** `google_mlkit_text_recognition` (MIT plugin; ML Kit binaries free/proprietary) | OCR for image items. **Bundled Latin model — no Google Play Services, fully offline** (fits the sideloaded/de-Googled posture); behind the `OcrEngine` seam, graceful off-Android. Non-Latin scripts deferred (BACKLOG). |
 | **Graph viz (P10):** `graphview` | Interactive relationship explorer. |
 | **Charts (P10d-2):** `fl_chart` | On-device Dashboard storage donuts + library-activity bars. Pure-Dart (CustomPainter), no native deps, no telemetry. |
 | ~~**v3:** `supabase_flutter`, Stripe/PayPal SDKs~~ | **Dropped** (no cloud/credits). |

diff --git a/docs/VERIFICATION.md b/docs/VERIFICATION.md
@@ -936,6 +936,15 @@ entries, or verify after P11c lands.)*
 - [ ] **Default off** (and when generation is disabled): downloads produce **no** auto-summary and no AI
       nudge; the queue still drains normally; the on-demand "Summarize with AI" still works.
 
+### P13b-1 — OCR (on-demand)  *(install `app-arm64-v8a-debug.apk`; image item)*
+- [ ] Open an **image** item that contains legible text → a **"Text in image"** section shows a **Scan
+      text** button; tap it → the recognized text appears, **fully offline** (airplane mode), and persists
+      across an app restart; **Rescan** re-runs it.
+- [ ] After scanning, **search** the library for a word that appears **only in the image text** → the image
+      is returned (OCR feeds full-text search). With semantic search on, "related"/search also benefit.
+- [ ] An image with no readable text shows a "No readable text found" note (no crash). A **video/audio**
+      item shows **no** OCR section.
+
 ### P13 (later subphases)
 - [ ] **Transcription / summarization / translation / OCR** each work (capability-gated) and write
       results back to the item.

diff --git a/docs/design/P13-PLAN.md b/docs/design/P13-PLAN.md
@@ -113,22 +113,41 @@ background, **opt-in (default off)**, mirroring the `autoTranscribe` precedent (
   spot-check.** Shares the queue-decoupled-background-AI deferral (inline-before-next-pump like
   `autoTranscribe`) and the LLM+HNSW RAM co-residency check (P13d) — both in `BACKLOG.md`.
 
-### `[ ]` P13b — Translation & OCR (ML Kit) *(native; new deps; APK; split into 2 PRs)*
+### P13b — OCR & Translation (ML Kit) *(native; new deps; APK)*
 On-device text intelligence that is **device-universal-ish** — gated on ML Kit + opt-in, not the RAM tier.
-Adds `google_mlkit_translation` / `google_mlkit_text_recognition`; measure APK-size impact in the first build.
+**Reordered (maintainer call): OCR leads** — it uses the bundled Latin model (no Google Play Services, fully
+offline) so it de-risks the ML Kit dependency before the more complex translation (language-pack downloads +
+target-language UX + GMS nuance). Measure APK-size impact in the first ML Kit build.
 
-#### `[ ]` P13b-1 — Translation *(native; APK)*
-- Translate an item's **description / transcript / summary** into the user's chosen language on-device, with
-  on-demand ML Kit language-pack download (opt-in, progress, integrity managed by ML Kit). Surface on item
-  detail next to the original text.
+#### `[~]` P13b-1 — OCR (on-demand) *(native; APK)*
+- Extract text from **image** items via ML Kit text recognition (bundled Latin); store it so it is
+  **searchable** — feeding the P10h FTS5 index and the semantic embed doc — and shown on item detail.
+- **Exit / review:** an image with legible text becomes findable by that text in search; the OCR text persists
+  and feeds search/related; gating is graceful where ML Kit is unavailable.
+- **Status:** implemented (CI-green) — `google_mlkit_text_recognition` (bundled Latin, no GMS, offline);
+  `OcrEngine` interface + `MlKitOcrEngine`/`UnavailableOcrEngine` + factory/provider (mirrors the transcription
+  engine seam); `MediaMetadata.ocrText` (**schema v11→v12**) + `media_fts` gains an `ocr` column (table +
+  triggers + backfill rebuilt in the v12 migration) so search covers image text; `ocrText` added (capped) to
+  the embed doc; `MetadataRepository.updateOcrText`; a `_OcrSection` "Scan text"/"Rescan" action on image
+  detail. No opt-in toggle (OCR is free + offline). Tests: OCR FTS search, `updateOcrText` round-trip, v11→v12
+  migration (incl. FTS `ocr`), embed-doc inclusion, engine availability. **Pending APK spot-check** (scan a
+  real image → text appears + becomes searchable, offline). The widget + native ML Kit call are APK-verified.
+
+#### `[ ]` P13b-2 — Translation *(native; new dep; APK)*
+- Translate an item's **description / transcript / summary** into the app's language (default) with a
+  target-language picker (reuse the curated `_captionLanguages` list), via `google_mlkit_translation` +
+  language-id; on-demand language-pack download (managed, Wi-Fi-aware). Ephemeral (no cache/schema). Surface
+  on item detail next to the original text. **Note the GMS nuance** (translation downloads models from a
+  Google endpoint) and document it.
 - **Exit / review:** translate a non-English item's text offline after the pack downloads; no pack ⇒ a clear
   one-time setup prompt; nothing leaves the device.
 
-#### `[ ]` P13b-2 — OCR *(native; APK)*
-- Extract text from **image downloads** via ML Kit text recognition; store it (one schema bump if needed) so
-  it is **searchable** — feeding the P10h FTS5 index and the semantic embed doc — and shown on item detail.
-- **Exit / review:** an image with legible text becomes findable by that text in search; the OCR text persists
-  and feeds search/related; gating is graceful where ML Kit is unavailable.
+#### `[ ]` P13b-3 — Auto-OCR on download *(follow-up; native; APK)*
+- Opt-in (default off) auto-scan of **image** downloads, mirroring P13a-2 auto-summarize: a settings toggle +
+  a gated block in `queue_controller._persistCompleted` (runs inline; OCR is cheap + offline) → `updateOcrText`
+  → an Activity Inbox entry. Grows search coverage automatically.
+- **Exit / review:** with auto-OCR on, a finished image download is scanned + becomes searchable offline;
+  default-off does nothing; the queue still drains.
 
 ### `[ ]` P13c — Smart auto-tagging *(generation; APK)*
 LLM-suggested tags feeding the **existing** tag system — builds directly on the P13a generation patterns.

diff --git a/lib/core/ai/inference_error.dart b/lib/core/ai/inference_error.dart
@@ -11,6 +11,7 @@ enum InferenceErrorCode {
   embedFailed,
   generateFailed,
   transcribeFailed,
+  ocrFailed,
 
   /// A capability seam exists but has no working implementation on this build —
   /// e.g. `generateStructured` (P12f forward seam): the method is defined and

diff --git a/lib/core/ai/ml_kit_ocr_engine.dart b/lib/core/ai/ml_kit_ocr_engine.dart
@@ -0,0 +1,35 @@
+import 'package:google_mlkit_text_recognition/google_mlkit_text_recognition.dart';
+import 'package:grabbit/core/ai/inference_error.dart';
+import 'package:grabbit/core/ai/ocr_engine.dart';
+
+/// Android [OcrEngine] backed by ML Kit on-device text recognition (P13b-1).
+/// Uses the **bundled Latin-script model** — no Google Play Services, no
+/// network, no model download — so it runs offline and fits the sideloaded
+/// posture. A recognizer is created per call and closed in `finally` (OCR is
+/// on-demand and infrequent, so there's no persistent native handle to leak).
+class MlKitOcrEngine implements OcrEngine {
+  @override
+  bool get isAvailable => true;
+
+  @override
+  Future<String> recognizeText(String imagePath) async {
+    final recognizer = TextRecognizer(script: TextRecognitionScript.latin);
+    try {
+      final result = await recognizer.processImage(
+        InputImage.fromFilePath(imagePath),
+      );
+      return result.text;
+    } on Exception catch (e) {
+      throw InferenceException(
+        InferenceErrorCode.ocrFailed,
+        'Text recognition failed',
+        cause: e,
+      );
+    } finally {
+      await recognizer.close();
+    }
+  }
+
+  @override
+  Future<void> close() async {}
+}
diff --git a/lib/core/ai/ocr_engine.dart b/lib/core/ai/ocr_engine.dart
@@ -0,0 +1,18 @@
+/// On-device **text recognition (OCR)** abstraction (P13b-1) — a sibling of the
+/// other per-capability AI engines. Extracts text from an image file fully
+/// on-device; unlike the model-based engines there's no download or device-tier
+/// gating (the bundled Latin model needs no Google Play Services and runs
+/// offline). Implementations are capability-gated: a host that can't run it gets
+/// the graceful [UnavailableOcrEngine] no-op (never a crash — AI-SPEC §1).
+abstract interface class OcrEngine {
+  /// Whether text recognition can run on this host (Android + bundled model).
+  bool get isAvailable;
+
+  /// Recognizes text in the image at [imagePath] and returns the extracted text
+  /// (possibly empty — no readable text). Throws an [InferenceException]
+  /// (`unavailable`/`ocrFailed`) on failure.
+  Future<String> recognizeText(String imagePath);
+
+  /// Releases any native resources held by the recognizer.
+  Future<void> close();
+}
diff --git a/lib/core/ai/ocr_engine_factory.dart b/lib/core/ai/ocr_engine_factory.dart
@@ -0,0 +1,12 @@
+import 'dart:io';
+
+import 'package:grabbit/core/ai/ml_kit_ocr_engine.dart';
+import 'package:grabbit/core/ai/ocr_engine.dart';
+import 'package:grabbit/core/ai/unavailable_ocr_engine.dart';
+
+/// Maps the host platform to its [OcrEngine] — the runtime "registry" seam
+/// (mirrors `transcriptionEngineFor`). Android gets the real ML Kit engine;
+/// every other host gets the graceful [UnavailableOcrEngine] (OCR stays off,
+/// never crashes).
+OcrEngine ocrEngineFor() =>
+    Platform.isAndroid ? MlKitOcrEngine() : const UnavailableOcrEngine();
diff --git a/lib/core/ai/ocr_provider.dart b/lib/core/ai/ocr_provider.dart
@@ -0,0 +1,12 @@
+import 'package:grabbit/core/ai/ocr_engine.dart';
+import 'package:grabbit/core/ai/ocr_engine_factory.dart';
+import 'package:riverpod_annotation/riverpod_annotation.dart';
+
+part 'ocr_provider.g.dart';
+
+/// The [OcrEngine] for this host (P13b-1). Routes via `ocrEngineFor`; an
+/// unsupported platform yields the graceful [UnavailableOcrEngine] — OCR simply
+/// stays off, never crashes. No model download or device-tier gating (the
+/// bundled Latin model runs offline on any Android device).
+@Riverpod(keepAlive: true)
+OcrEngine ocrEngine(Ref ref) => ocrEngineFor();
diff --git a/lib/core/ai/unavailable_ocr_engine.dart b/lib/core/ai/unavailable_ocr_engine.dart
@@ -0,0 +1,24 @@
+import 'package:grabbit/core/ai/inference_error.dart';
+import 'package:grabbit/core/ai/ocr_engine.dart';
+
+/// Graceful no-op [OcrEngine] for hosts that can't run ML Kit text recognition
+/// (non-Android until P15, CI). Never crashes — OCR simply stays unavailable
+/// (AI-SPEC §1); the "Scan text" affordance is hidden when `isAvailable` is
+/// false.
+class UnavailableOcrEngine implements OcrEngine {
+  const UnavailableOcrEngine();
+
+  static const _ex = InferenceException(
+    InferenceErrorCode.unavailable,
+    'On-device text recognition is not available on this device',
+  );
+
+  @override
+  bool get isAvailable => false;
+
+  @override
+  Future<String> recognizeText(String imagePath) => throw _ex;
+
+  @override
+  Future<void> close() async {}
+}
diff --git a/lib/core/db/database.dart b/lib/core/db/database.dart
@@ -75,6 +75,9 @@ class MediaMetadata extends Table {
   // model produced it (attribution + a "Regenerate" prompt when it changes).
   TextColumn get aiSummary => text().nullable()();
   TextColumn get aiSummaryModelId => text().nullable()();
+  // P13b-1: on-device OCR text extracted from an image download. Null until the
+  // user scans the image; feeds full-text search (media_fts) + the embed doc.
+  TextColumn get ocrText => text().nullable()();
 
   @override
   Set<Column<Object>> get primaryKey => {itemId};
@@ -214,7 +217,7 @@ class AppDatabase extends _$AppDatabase {
     : super(executor ?? driftDatabase(name: 'grabbit'));
 
   @override
-  int get schemaVersion => 11;
+  int get schemaVersion => 12;
 
   @override
   MigrationStrategy get migration => MigrationStrategy(
@@ -266,6 +269,23 @@ class AppDatabase extends _$AppDatabase {
         await m.addColumn(mediaMetadata, mediaMetadata.aiSummary);
         await m.addColumn(mediaMetadata, mediaMetadata.aiSummaryModelId);
       }
+      if (from < 12) {
+        // P13b-1: OCR text column + add it to the FTS index. FTS5 can't
+        // ALTER ADD COLUMN, so drop the table + triggers and let
+        // `_createFtsObjects` rebuild them (now with `ocr`) and backfill.
+        await m.addColumn(mediaMetadata, mediaMetadata.ocrText);
+        for (final t in const [
+          'media_fts_ai_items',
+          'media_fts_au_items',
+          'media_fts_ad_items',
+          'media_fts_ai_meta',
+          'media_fts_au_meta',
+          'media_fts_ad_meta',
+        ]) {
+          await customStatement('DROP TRIGGER IF EXISTS $t');
+        }
+        await customStatement('DROP TABLE IF EXISTS media_fts');
+      }
       await _createIndices();
       await _createFtsObjects();
     },
@@ -348,64 +368,66 @@ class AppDatabase extends _$AppDatabase {
   Future<void> _createFtsObjects() async {
     await customStatement(
       'CREATE VIRTUAL TABLE IF NOT EXISTS media_fts USING fts5('
-      'item_id UNINDEXED, title, description, transcript, '
+      'item_id UNINDEXED, title, description, transcript, ocr, '
       "tokenize = 'unicode61 remove_diacritics 2')",
     );
-    // media_items → fts (title is here; description/transcript joined in).
+    // media_items → fts (title is here; description/transcript/ocr joined in).
     await customStatement(
       'CREATE TRIGGER IF NOT EXISTS media_fts_ai_items '
       'AFTER INSERT ON media_items BEGIN '
       'DELETE FROM media_fts WHERE item_id = new.id; '
-      'INSERT INTO media_fts(item_id, title, description, transcript) '
+      'INSERT INTO media_fts(item_id, title, description, transcript, ocr) '
       'SELECT new.id, new.title, '
       '(SELECT description FROM media_metadata WHERE item_id = new.id), '
-      '(SELECT transcript FROM media_metadata WHERE item_id = new.id); END',
+      '(SELECT transcript FROM media_metadata WHERE item_id = new.id), '
+      '(SELECT ocr_text FROM media_metadata WHERE item_id = new.id); END',
     );
     await customStatement(
       'CREATE TRIGGER IF NOT EXISTS media_fts_au_items '
       'AFTER UPDATE OF title ON media_items BEGIN '
       'DELETE FROM media_fts WHERE item_id = new.id; '
-      'INSERT INTO media_fts(item_id, title, description, transcript) '
+      'INSERT INTO media_fts(item_id, title, description, transcript, ocr) '
       'SELECT new.id, new.title, '
       '(SELECT description FROM media_metadata WHERE item_id = new.id), '
-      '(SELECT transcript FROM media_metadata WHERE item_id = new.id); END',
+      '(SELECT transcript FROM media_metadata WHERE item_id = new.id), '
+      '(SELECT ocr_text FROM media_metadata WHERE item_id = new.id); END',
     );
     await customStatement(
       'CREATE TRIGGER IF NOT EXISTS media_fts_ad_items '
       'AFTER DELETE ON media_items BEGIN '
       'DELETE FROM media_fts WHERE item_id = old.id; END',
     );
-    // media_metadata → fts (description/transcript here; title joined in).
+    // media_metadata → fts (description/transcript/ocr here; title joined in).
     await customStatement(
       'CREATE TRIGGER IF NOT EXISTS media_fts_ai_meta '
       'AFTER INSERT ON media_metadata BEGIN '
       'DELETE FROM media_fts WHERE item_id = new.item_id; '
-      'INSERT INTO media_fts(item_id, title, description, transcript) '
+      'INSERT INTO media_fts(item_id, title, description, transcript, ocr) '
       'SELECT new.item_id, '
       '(SELECT title FROM media_items WHERE id = new.item_id), '
-      'new.description, new.transcript; END',
+      'new.description, new.transcript, new.ocr_text; END',
     );
     await customStatement(
       'CREATE TRIGGER IF NOT EXISTS media_fts_au_meta '
-      'AFTER UPDATE OF description, transcript ON media_metadata BEGIN '
+      'AFTER UPDATE OF description, transcript, ocr_text ON media_metadata BEGIN '
       'DELETE FROM media_fts WHERE item_id = new.item_id; '
-      'INSERT INTO media_fts(item_id, title, description, transcript) '
+      'INSERT INTO media_fts(item_id, title, description, transcript, ocr) '
       'SELECT new.item_id, '
       '(SELECT title FROM media_items WHERE id = new.item_id), '
-      'new.description, new.transcript; END',
+      'new.description, new.transcript, new.ocr_text; END',
     );
     await customStatement(
       'CREATE TRIGGER IF NOT EXISTS media_fts_ad_meta '
       'AFTER DELETE ON media_metadata BEGIN '
       'DELETE FROM media_fts WHERE item_id = old.item_id; '
-      'INSERT INTO media_fts(item_id, title, description, transcript) '
-      'SELECT old.item_id, title, NULL, NULL FROM media_items '
+      'INSERT INTO media_fts(item_id, title, description, transcript, ocr) '
+      'SELECT old.item_id, title, NULL, NULL, NULL FROM media_items '
       'WHERE id = old.item_id; END',
     );
     // One-time backfill of pre-existing rows (no-op on a fresh, empty DB).
     await customStatement(
-      'INSERT INTO media_fts(item_id, title, description, transcript) '
-      'SELECT mi.id, mi.title, mm.description, mm.transcript '
+      'INSERT INTO media_fts(item_id, title, description, transcript, ocr) '
+      'SELECT mi.id, mi.title, mm.description, mm.transcript, mm.ocr_text '
       'FROM media_items mi '
       'LEFT JOIN media_metadata mm ON mm.item_id = mi.id '
       'WHERE mi.id NOT IN (SELECT item_id FROM media_fts)',

diff --git a/lib/core/graph/embedding_doc.dart b/lib/core/graph/embedding_doc.dart
diff --git a/lib/features/library/data/metadata_repository.dart b/lib/features/library/data/metadata_repository.dart
@@ -674,6 +674,18 @@ class MetadataRepository {
         );
   }
 
+  /// Stores the on-device OCR [text] extracted from an image (P13b-1). Upserts
+  /// (only this column) so it works for items whose metadata row predates the
+  /// column; passing `null` clears it. The `media_fts` triggers reindex it for
+  /// full-text search automatically.
+  Future<void> updateOcrText(String itemId, String? text) async {
+    await _db
+        .into(_db.mediaMetadata)
+        .insertOnConflictUpdate(
+          MediaMetadataCompanion.insert(itemId: itemId, ocrText: Value(text)),
+        );
+  }
+
   // --- Tags ---
 
   Stream<List<Tag>> watchTagsForItem(String itemId) {