Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/BACKLOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@
_(nothing active — pick the next batch from below)_

## Deferred / future refinements
- [ ] **OCR — non-Latin scripts.** P13b-1 ships the **bundled Latin** ML Kit recognizer (no Google Play
Services, offline). Chinese/Japanese/Korean/Devanagari need their own ML Kit script models (extra APK
size or a download). Add a script choice if users want non-Latin OCR. *(From P13b-1.)*
- [ ] **Auto-summarize — queue-decoupled background run.** P13a-2 generates the auto-summary **inline** in
`_persistCompleted` before the next download pumps (gated on "model present" so it can't stall on a
fetch), exactly like `autoTranscribe`. Generation is heavier than whisper-tiny, so a fuller design
Expand Down
1 change: 1 addition & 0 deletions docs/SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ Implementation-level detail. Versions are targets to confirm at scaffold time
| **On-device AI (P10b-2+):** `flutter_gemma` (MediaPipe/LiteRT-LM — embeddings + LLM + RAG; added P10b-2 embedder-only), a whisper.cpp pkg (`whisper_ggml_plus`/`whisper_kit`), ML Kit (OCR/translate) | On-device/edge AI. See `docs/AI-SPEC.md`. |
| **On-device AI (P12c):** `unorm_dart` (pure-Dart Unicode NFKC); `onnxruntime_v2` (P12c-2; 16KB-page + GPU) | NFKC for the hand-rolled XLM-R tokenizer (the multilingual embedder); ONNX runtime for MiniLM. |
| **On-device AI (P12e):** `whisper_ggml_plus` (whisper.cpp FFI, MIT) | Speech transcription. Chosen over `whisper_kit` for Windows/v2 parity; app-managed ggml model fed as `modelPath`, 16 kHz WAV made with our existing ffmpeg (no ffmpeg companion). |
| **On-device AI (P13b-1):** `google_mlkit_text_recognition` (MIT plugin; ML Kit binaries free/proprietary) | OCR for image items. **Bundled Latin model — no Google Play Services, fully offline** (fits the sideloaded/de-Googled posture); behind the `OcrEngine` seam, graceful off-Android. Non-Latin scripts deferred (BACKLOG). |
| **Graph viz (P10):** `graphview` | Interactive relationship explorer. |
| **Charts (P10d-2):** `fl_chart` | On-device Dashboard storage donuts + library-activity bars. Pure-Dart (CustomPainter), no native deps, no telemetry. |
| ~~**v3:** `supabase_flutter`, Stripe/PayPal SDKs~~ | **Dropped** (no cloud/credits). |
Expand Down
9 changes: 9 additions & 0 deletions docs/VERIFICATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -936,6 +936,15 @@ entries, or verify after P11c lands.)*
- [ ] **Default off** (and when generation is disabled): downloads produce **no** auto-summary and no AI
nudge; the queue still drains normally; the on-demand "Summarize with AI" still works.

### P13b-1 — OCR (on-demand) *(install `app-arm64-v8a-debug.apk`; image item)*
- [ ] Open an **image** item that contains legible text → a **"Text in image"** section shows a **Scan
text** button; tap it → the recognized text appears, **fully offline** (airplane mode), and persists
across an app restart; **Rescan** re-runs it.
- [ ] After scanning, **search** the library for a word that appears **only in the image text** → the image
is returned (OCR feeds full-text search). With semantic search on, "related"/search also benefit.
- [ ] An image with no readable text shows a "No readable text found" note (no crash). A **video/audio**
item shows **no** OCR section.

### P13 (later subphases)
- [ ] **Transcription / summarization / translation / OCR** each work (capability-gated) and write
results back to the item.
Expand Down
41 changes: 30 additions & 11 deletions docs/design/P13-PLAN.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,22 +113,41 @@ background, **opt-in (default off)**, mirroring the `autoTranscribe` precedent (
spot-check.** Shares the queue-decoupled-background-AI deferral (inline-before-next-pump like
`autoTranscribe`) and the LLM+HNSW RAM co-residency check (P13d) — both in `BACKLOG.md`.

### `[ ]` P13b — Translation & OCR (ML Kit) *(native; new deps; APK; split into 2 PRs)*
### P13b — OCR & Translation (ML Kit) *(native; new deps; APK)*
On-device text intelligence that is **device-universal-ish** — gated on ML Kit + opt-in, not the RAM tier.
Adds `google_mlkit_translation` / `google_mlkit_text_recognition`; measure APK-size impact in the first build.
**Reordered (maintainer call): OCR leads** — it uses the bundled Latin model (no Google Play Services, fully
offline) so it de-risks the ML Kit dependency before the more complex translation (language-pack downloads +
target-language UX + GMS nuance). Measure APK-size impact in the first ML Kit build.

#### `[ ]` P13b-1 — Translation *(native; APK)*
- Translate an item's **description / transcript / summary** into the user's chosen language on-device, with
on-demand ML Kit language-pack download (opt-in, progress, integrity managed by ML Kit). Surface on item
detail next to the original text.
#### `[~]` P13b-1 — OCR (on-demand) *(native; APK)*
- Extract text from **image** items via ML Kit text recognition (bundled Latin); store it so it is
**searchable** — feeding the P10h FTS5 index and the semantic embed doc — and shown on item detail.
- **Exit / review:** an image with legible text becomes findable by that text in search; the OCR text persists
and feeds search/related; gating is graceful where ML Kit is unavailable.
- **Status:** implemented (CI-green) — `google_mlkit_text_recognition` (bundled Latin, no GMS, offline);
`OcrEngine` interface + `MlKitOcrEngine`/`UnavailableOcrEngine` + factory/provider (mirrors the transcription
engine seam); `MediaMetadata.ocrText` (**schema v11→v12**) + `media_fts` gains an `ocr` column (table +
triggers + backfill rebuilt in the v12 migration) so search covers image text; `ocrText` added (capped) to
the embed doc; `MetadataRepository.updateOcrText`; a `_OcrSection` "Scan text"/"Rescan" action on image
detail. No opt-in toggle (OCR is free + offline). Tests: OCR FTS search, `updateOcrText` round-trip, v11→v12
migration (incl. FTS `ocr`), embed-doc inclusion, engine availability. **Pending APK spot-check** (scan a
real image → text appears + becomes searchable, offline). The widget + native ML Kit call are APK-verified.

#### `[ ]` P13b-2 — Translation *(native; new dep; APK)*
- Translate an item's **description / transcript / summary** into the app's language (default) with a
target-language picker (reuse the curated `_captionLanguages` list), via `google_mlkit_translation` +
language-id; on-demand language-pack download (managed, Wi-Fi-aware). Ephemeral (no cache/schema). Surface
on item detail next to the original text. **Note the GMS nuance** (translation downloads models from a
Google endpoint) and document it.
- **Exit / review:** translate a non-English item's text offline after the pack downloads; no pack ⇒ a clear
one-time setup prompt; nothing leaves the device.

#### `[ ]` P13b-2 — OCR *(native; APK)*
- Extract text from **image downloads** via ML Kit text recognition; store it (one schema bump if needed) so
it is **searchable** — feeding the P10h FTS5 index and the semantic embed doc — and shown on item detail.
- **Exit / review:** an image with legible text becomes findable by that text in search; the OCR text persists
and feeds search/related; gating is graceful where ML Kit is unavailable.
#### `[ ]` P13b-3 — Auto-OCR on download *(follow-up; native; APK)*
- Opt-in (default off) auto-scan of **image** downloads, mirroring P13a-2 auto-summarize: a settings toggle +
a gated block in `queue_controller._persistCompleted` (runs inline; OCR is cheap + offline) → `updateOcrText`
→ an Activity Inbox entry. Grows search coverage automatically.
- **Exit / review:** with auto-OCR on, a finished image download is scanned + becomes searchable offline;
default-off does nothing; the queue still drains.

### `[ ]` P13c — Smart auto-tagging *(generation; APK)*
LLM-suggested tags feeding the **existing** tag system — builds directly on the P13a generation patterns.
Expand Down
1 change: 1 addition & 0 deletions lib/core/ai/inference_error.dart
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ enum InferenceErrorCode {
embedFailed,
generateFailed,
transcribeFailed,
ocrFailed,

/// A capability seam exists but has no working implementation on this build —
/// e.g. `generateStructured` (P12f forward seam): the method is defined and
Expand Down
35 changes: 35 additions & 0 deletions lib/core/ai/ml_kit_ocr_engine.dart
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
import 'package:google_mlkit_text_recognition/google_mlkit_text_recognition.dart';
import 'package:grabbit/core/ai/inference_error.dart';
import 'package:grabbit/core/ai/ocr_engine.dart';

/// Android [OcrEngine] backed by ML Kit on-device text recognition (P13b-1).
/// Uses the **bundled Latin-script model** — no Google Play Services, no
/// network, no model download — so it runs offline and fits the sideloaded
/// posture. A recognizer is created per call and closed in `finally` (OCR is
/// on-demand and infrequent, so there's no persistent native handle to leak).
class MlKitOcrEngine implements OcrEngine {
@override
bool get isAvailable => true;

@override
Future<String> recognizeText(String imagePath) async {
final recognizer = TextRecognizer(script: TextRecognitionScript.latin);
try {
final result = await recognizer.processImage(
InputImage.fromFilePath(imagePath),
);
return result.text;
} on Exception catch (e) {
throw InferenceException(
InferenceErrorCode.ocrFailed,
'Text recognition failed',
cause: e,
);
} finally {
await recognizer.close();
}
}

@override
Future<void> close() async {}
}
18 changes: 18 additions & 0 deletions lib/core/ai/ocr_engine.dart
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
/// On-device **text recognition (OCR)** abstraction (P13b-1) — a sibling of the
/// other per-capability AI engines. Extracts text from an image file fully
/// on-device; unlike the model-based engines there's no download or device-tier
/// gating (the bundled Latin model needs no Google Play Services and runs
/// offline). Implementations are capability-gated: a host that can't run it gets
/// the graceful [UnavailableOcrEngine] no-op (never a crash — AI-SPEC §1).
abstract interface class OcrEngine {
/// Whether text recognition can run on this host (Android + bundled model).
bool get isAvailable;

/// Recognizes text in the image at [imagePath] and returns the extracted text
/// (possibly empty — no readable text). Throws an [InferenceException]
/// (`unavailable`/`ocrFailed`) on failure.
Future<String> recognizeText(String imagePath);

/// Releases any native resources held by the recognizer.
Future<void> close();
}
12 changes: 12 additions & 0 deletions lib/core/ai/ocr_engine_factory.dart
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
import 'dart:io';

import 'package:grabbit/core/ai/ml_kit_ocr_engine.dart';
import 'package:grabbit/core/ai/ocr_engine.dart';
import 'package:grabbit/core/ai/unavailable_ocr_engine.dart';

/// Maps the host platform to its [OcrEngine] — the runtime "registry" seam
/// (mirrors `transcriptionEngineFor`). Android gets the real ML Kit engine;
/// every other host gets the graceful [UnavailableOcrEngine] (OCR stays off,
/// never crashes).
OcrEngine ocrEngineFor() =>
Platform.isAndroid ? MlKitOcrEngine() : const UnavailableOcrEngine();
12 changes: 12 additions & 0 deletions lib/core/ai/ocr_provider.dart
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
import 'package:grabbit/core/ai/ocr_engine.dart';
import 'package:grabbit/core/ai/ocr_engine_factory.dart';
import 'package:riverpod_annotation/riverpod_annotation.dart';

part 'ocr_provider.g.dart';

/// The [OcrEngine] for this host (P13b-1). Routes via `ocrEngineFor`; an
/// unsupported platform yields the graceful [UnavailableOcrEngine] — OCR simply
/// stays off, never crashes. No model download or device-tier gating (the
/// bundled Latin model runs offline on any Android device).
@Riverpod(keepAlive: true)
OcrEngine ocrEngine(Ref ref) => ocrEngineFor();
24 changes: 24 additions & 0 deletions lib/core/ai/unavailable_ocr_engine.dart
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import 'package:grabbit/core/ai/inference_error.dart';
import 'package:grabbit/core/ai/ocr_engine.dart';

/// Graceful no-op [OcrEngine] for hosts that can't run ML Kit text recognition
/// (non-Android until P15, CI). Never crashes — OCR simply stays unavailable
/// (AI-SPEC §1); the "Scan text" affordance is hidden when `isAvailable` is
/// false.
class UnavailableOcrEngine implements OcrEngine {
const UnavailableOcrEngine();

static const _ex = InferenceException(
InferenceErrorCode.unavailable,
'On-device text recognition is not available on this device',
);

@override
bool get isAvailable => false;

@override
Future<String> recognizeText(String imagePath) => throw _ex;

@override
Future<void> close() async {}
}
56 changes: 39 additions & 17 deletions lib/core/db/database.dart
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,9 @@ class MediaMetadata extends Table {
// model produced it (attribution + a "Regenerate" prompt when it changes).
TextColumn get aiSummary => text().nullable()();
TextColumn get aiSummaryModelId => text().nullable()();
// P13b-1: on-device OCR text extracted from an image download. Null until the
// user scans the image; feeds full-text search (media_fts) + the embed doc.
TextColumn get ocrText => text().nullable()();

@override
Set<Column<Object>> get primaryKey => {itemId};
Expand Down Expand Up @@ -214,7 +217,7 @@ class AppDatabase extends _$AppDatabase {
: super(executor ?? driftDatabase(name: 'grabbit'));

@override
int get schemaVersion => 11;
int get schemaVersion => 12;

@override
MigrationStrategy get migration => MigrationStrategy(
Expand Down Expand Up @@ -266,6 +269,23 @@ class AppDatabase extends _$AppDatabase {
await m.addColumn(mediaMetadata, mediaMetadata.aiSummary);
await m.addColumn(mediaMetadata, mediaMetadata.aiSummaryModelId);
}
if (from < 12) {
// P13b-1: OCR text column + add it to the FTS index. FTS5 can't
// ALTER ADD COLUMN, so drop the table + triggers and let
// `_createFtsObjects` rebuild them (now with `ocr`) and backfill.
await m.addColumn(mediaMetadata, mediaMetadata.ocrText);
for (final t in const [
'media_fts_ai_items',
'media_fts_au_items',
'media_fts_ad_items',
'media_fts_ai_meta',
'media_fts_au_meta',
'media_fts_ad_meta',
]) {
await customStatement('DROP TRIGGER IF EXISTS $t');
}
await customStatement('DROP TABLE IF EXISTS media_fts');
}
await _createIndices();
await _createFtsObjects();
},
Expand Down Expand Up @@ -348,64 +368,66 @@ class AppDatabase extends _$AppDatabase {
Future<void> _createFtsObjects() async {
await customStatement(
'CREATE VIRTUAL TABLE IF NOT EXISTS media_fts USING fts5('
'item_id UNINDEXED, title, description, transcript, '
'item_id UNINDEXED, title, description, transcript, ocr, '
"tokenize = 'unicode61 remove_diacritics 2')",
);
// media_items → fts (title is here; description/transcript joined in).
// media_items → fts (title is here; description/transcript/ocr joined in).
await customStatement(
'CREATE TRIGGER IF NOT EXISTS media_fts_ai_items '
'AFTER INSERT ON media_items BEGIN '
'DELETE FROM media_fts WHERE item_id = new.id; '
'INSERT INTO media_fts(item_id, title, description, transcript) '
'INSERT INTO media_fts(item_id, title, description, transcript, ocr) '
'SELECT new.id, new.title, '
'(SELECT description FROM media_metadata WHERE item_id = new.id), '
'(SELECT transcript FROM media_metadata WHERE item_id = new.id); END',
'(SELECT transcript FROM media_metadata WHERE item_id = new.id), '
'(SELECT ocr_text FROM media_metadata WHERE item_id = new.id); END',
);
await customStatement(
'CREATE TRIGGER IF NOT EXISTS media_fts_au_items '
'AFTER UPDATE OF title ON media_items BEGIN '
'DELETE FROM media_fts WHERE item_id = new.id; '
'INSERT INTO media_fts(item_id, title, description, transcript) '
'INSERT INTO media_fts(item_id, title, description, transcript, ocr) '
'SELECT new.id, new.title, '
'(SELECT description FROM media_metadata WHERE item_id = new.id), '
'(SELECT transcript FROM media_metadata WHERE item_id = new.id); END',
'(SELECT transcript FROM media_metadata WHERE item_id = new.id), '
'(SELECT ocr_text FROM media_metadata WHERE item_id = new.id); END',
);
await customStatement(
'CREATE TRIGGER IF NOT EXISTS media_fts_ad_items '
'AFTER DELETE ON media_items BEGIN '
'DELETE FROM media_fts WHERE item_id = old.id; END',
);
// media_metadata → fts (description/transcript here; title joined in).
// media_metadata → fts (description/transcript/ocr here; title joined in).
await customStatement(
'CREATE TRIGGER IF NOT EXISTS media_fts_ai_meta '
'AFTER INSERT ON media_metadata BEGIN '
'DELETE FROM media_fts WHERE item_id = new.item_id; '
'INSERT INTO media_fts(item_id, title, description, transcript) '
'INSERT INTO media_fts(item_id, title, description, transcript, ocr) '
'SELECT new.item_id, '
'(SELECT title FROM media_items WHERE id = new.item_id), '
'new.description, new.transcript; END',
'new.description, new.transcript, new.ocr_text; END',
);
await customStatement(
'CREATE TRIGGER IF NOT EXISTS media_fts_au_meta '
'AFTER UPDATE OF description, transcript ON media_metadata BEGIN '
'AFTER UPDATE OF description, transcript, ocr_text ON media_metadata BEGIN '
'DELETE FROM media_fts WHERE item_id = new.item_id; '
'INSERT INTO media_fts(item_id, title, description, transcript) '
'INSERT INTO media_fts(item_id, title, description, transcript, ocr) '
'SELECT new.item_id, '
'(SELECT title FROM media_items WHERE id = new.item_id), '
'new.description, new.transcript; END',
'new.description, new.transcript, new.ocr_text; END',
);
await customStatement(
'CREATE TRIGGER IF NOT EXISTS media_fts_ad_meta '
'AFTER DELETE ON media_metadata BEGIN '
'DELETE FROM media_fts WHERE item_id = old.item_id; '
'INSERT INTO media_fts(item_id, title, description, transcript) '
'SELECT old.item_id, title, NULL, NULL FROM media_items '
'INSERT INTO media_fts(item_id, title, description, transcript, ocr) '
'SELECT old.item_id, title, NULL, NULL, NULL FROM media_items '
'WHERE id = old.item_id; END',
);
// One-time backfill of pre-existing rows (no-op on a fresh, empty DB).
await customStatement(
'INSERT INTO media_fts(item_id, title, description, transcript) '
'SELECT mi.id, mi.title, mm.description, mm.transcript '
'INSERT INTO media_fts(item_id, title, description, transcript, ocr) '
'SELECT mi.id, mi.title, mm.description, mm.transcript, mm.ocr_text '
'FROM media_items mi '
'LEFT JOIN media_metadata mm ON mm.item_id = mi.id '
'WHERE mi.id NOT IN (SELECT item_id FROM media_fts)',
Expand Down
Binary file modified lib/core/graph/embedding_doc.dart
Binary file not shown.
12 changes: 12 additions & 0 deletions lib/features/library/data/metadata_repository.dart
Original file line number Diff line number Diff line change
Expand Up @@ -674,6 +674,18 @@ class MetadataRepository {
);
}

/// Stores the on-device OCR [text] extracted from an image (P13b-1). Upserts
/// (only this column) so it works for items whose metadata row predates the
/// column; passing `null` clears it. The `media_fts` triggers reindex it for
/// full-text search automatically.
Future<void> updateOcrText(String itemId, String? text) async {
await _db
.into(_db.mediaMetadata)
.insertOnConflictUpdate(
MediaMetadataCompanion.insert(itemId: itemId, ocrText: Value(text)),
);
}

// --- Tags ---

Stream<List<Tag>> watchTagsForItem(String itemId) {
Expand Down
Loading
Loading