diff --git a/docs/BACKLOG.md b/docs/BACKLOG.md index dfd1cf9..85f2448 100644 --- a/docs/BACKLOG.md +++ b/docs/BACKLOG.md @@ -8,6 +8,9 @@ _(nothing active — pick the next batch from below)_ ## Deferred / future refinements +- [ ] **OCR — non-Latin scripts.** P13b-1 ships the **bundled Latin** ML Kit recognizer (no Google Play + Services, offline). Chinese/Japanese/Korean/Devanagari need their own ML Kit script models (extra APK + size or a download). Add a script choice if users want non-Latin OCR. *(From P13b-1.)* - [ ] **Auto-summarize — queue-decoupled background run.** P13a-2 generates the auto-summary **inline** in `_persistCompleted` before the next download pumps (gated on "model present" so it can't stall on a fetch), exactly like `autoTranscribe`. Generation is heavier than whisper-tiny, so a fuller design diff --git a/docs/SPEC.md b/docs/SPEC.md index bfca546..0637269 100644 --- a/docs/SPEC.md +++ b/docs/SPEC.md @@ -37,6 +37,7 @@ Implementation-level detail. Versions are targets to confirm at scaffold time | **On-device AI (P10b-2+):** `flutter_gemma` (MediaPipe/LiteRT-LM — embeddings + LLM + RAG; added P10b-2 embedder-only), a whisper.cpp pkg (`whisper_ggml_plus`/`whisper_kit`), ML Kit (OCR/translate) | On-device/edge AI. See `docs/AI-SPEC.md`. | | **On-device AI (P12c):** `unorm_dart` (pure-Dart Unicode NFKC); `onnxruntime_v2` (P12c-2; 16KB-page + GPU) | NFKC for the hand-rolled XLM-R tokenizer (the multilingual embedder); ONNX runtime for MiniLM. | | **On-device AI (P12e):** `whisper_ggml_plus` (whisper.cpp FFI, MIT) | Speech transcription. Chosen over `whisper_kit` for Windows/v2 parity; app-managed ggml model fed as `modelPath`, 16 kHz WAV made with our existing ffmpeg (no ffmpeg companion). | +| **On-device AI (P13b-1):** `google_mlkit_text_recognition` (MIT plugin; ML Kit binaries free/proprietary) | OCR for image items. **Bundled Latin model — no Google Play Services, fully offline** (fits the sideloaded/de-Googled posture); behind the `OcrEngine` seam, graceful off-Android. Non-Latin scripts deferred (BACKLOG). | | **Graph viz (P10):** `graphview` | Interactive relationship explorer. | | **Charts (P10d-2):** `fl_chart` | On-device Dashboard storage donuts + library-activity bars. Pure-Dart (CustomPainter), no native deps, no telemetry. | | ~~**v3:** `supabase_flutter`, Stripe/PayPal SDKs~~ | **Dropped** (no cloud/credits). | diff --git a/docs/VERIFICATION.md b/docs/VERIFICATION.md index d443c75..1956a7a 100644 --- a/docs/VERIFICATION.md +++ b/docs/VERIFICATION.md @@ -936,6 +936,15 @@ entries, or verify after P11c lands.)* - [ ] **Default off** (and when generation is disabled): downloads produce **no** auto-summary and no AI nudge; the queue still drains normally; the on-demand "Summarize with AI" still works. +### P13b-1 — OCR (on-demand) *(install `app-arm64-v8a-debug.apk`; image item)* +- [ ] Open an **image** item that contains legible text → a **"Text in image"** section shows a **Scan + text** button; tap it → the recognized text appears, **fully offline** (airplane mode), and persists + across an app restart; **Rescan** re-runs it. +- [ ] After scanning, **search** the library for a word that appears **only in the image text** → the image + is returned (OCR feeds full-text search). With semantic search on, "related"/search also benefit. +- [ ] An image with no readable text shows a "No readable text found" note (no crash). A **video/audio** + item shows **no** OCR section. + ### P13 (later subphases) - [ ] **Transcription / summarization / translation / OCR** each work (capability-gated) and write results back to the item. diff --git a/docs/design/P13-PLAN.md b/docs/design/P13-PLAN.md index edf038b..c7a1f3b 100644 --- a/docs/design/P13-PLAN.md +++ b/docs/design/P13-PLAN.md @@ -113,22 +113,41 @@ background, **opt-in (default off)**, mirroring the `autoTranscribe` precedent ( spot-check.** Shares the queue-decoupled-background-AI deferral (inline-before-next-pump like `autoTranscribe`) and the LLM+HNSW RAM co-residency check (P13d) — both in `BACKLOG.md`. -### `[ ]` P13b — Translation & OCR (ML Kit) *(native; new deps; APK; split into 2 PRs)* +### P13b — OCR & Translation (ML Kit) *(native; new deps; APK)* On-device text intelligence that is **device-universal-ish** — gated on ML Kit + opt-in, not the RAM tier. -Adds `google_mlkit_translation` / `google_mlkit_text_recognition`; measure APK-size impact in the first build. +**Reordered (maintainer call): OCR leads** — it uses the bundled Latin model (no Google Play Services, fully +offline) so it de-risks the ML Kit dependency before the more complex translation (language-pack downloads + +target-language UX + GMS nuance). Measure APK-size impact in the first ML Kit build. -#### `[ ]` P13b-1 — Translation *(native; APK)* -- Translate an item's **description / transcript / summary** into the user's chosen language on-device, with - on-demand ML Kit language-pack download (opt-in, progress, integrity managed by ML Kit). Surface on item - detail next to the original text. +#### `[~]` P13b-1 — OCR (on-demand) *(native; APK)* +- Extract text from **image** items via ML Kit text recognition (bundled Latin); store it so it is + **searchable** — feeding the P10h FTS5 index and the semantic embed doc — and shown on item detail. +- **Exit / review:** an image with legible text becomes findable by that text in search; the OCR text persists + and feeds search/related; gating is graceful where ML Kit is unavailable. +- **Status:** implemented (CI-green) — `google_mlkit_text_recognition` (bundled Latin, no GMS, offline); + `OcrEngine` interface + `MlKitOcrEngine`/`UnavailableOcrEngine` + factory/provider (mirrors the transcription + engine seam); `MediaMetadata.ocrText` (**schema v11→v12**) + `media_fts` gains an `ocr` column (table + + triggers + backfill rebuilt in the v12 migration) so search covers image text; `ocrText` added (capped) to + the embed doc; `MetadataRepository.updateOcrText`; a `_OcrSection` "Scan text"/"Rescan" action on image + detail. No opt-in toggle (OCR is free + offline). Tests: OCR FTS search, `updateOcrText` round-trip, v11→v12 + migration (incl. FTS `ocr`), embed-doc inclusion, engine availability. **Pending APK spot-check** (scan a + real image → text appears + becomes searchable, offline). The widget + native ML Kit call are APK-verified. + +#### `[ ]` P13b-2 — Translation *(native; new dep; APK)* +- Translate an item's **description / transcript / summary** into the app's language (default) with a + target-language picker (reuse the curated `_captionLanguages` list), via `google_mlkit_translation` + + language-id; on-demand language-pack download (managed, Wi-Fi-aware). Ephemeral (no cache/schema). Surface + on item detail next to the original text. **Note the GMS nuance** (translation downloads models from a + Google endpoint) and document it. - **Exit / review:** translate a non-English item's text offline after the pack downloads; no pack ⇒ a clear one-time setup prompt; nothing leaves the device. -#### `[ ]` P13b-2 — OCR *(native; APK)* -- Extract text from **image downloads** via ML Kit text recognition; store it (one schema bump if needed) so - it is **searchable** — feeding the P10h FTS5 index and the semantic embed doc — and shown on item detail. -- **Exit / review:** an image with legible text becomes findable by that text in search; the OCR text persists - and feeds search/related; gating is graceful where ML Kit is unavailable. +#### `[ ]` P13b-3 — Auto-OCR on download *(follow-up; native; APK)* +- Opt-in (default off) auto-scan of **image** downloads, mirroring P13a-2 auto-summarize: a settings toggle + + a gated block in `queue_controller._persistCompleted` (runs inline; OCR is cheap + offline) → `updateOcrText` + → an Activity Inbox entry. Grows search coverage automatically. +- **Exit / review:** with auto-OCR on, a finished image download is scanned + becomes searchable offline; + default-off does nothing; the queue still drains. ### `[ ]` P13c — Smart auto-tagging *(generation; APK)* LLM-suggested tags feeding the **existing** tag system — builds directly on the P13a generation patterns. diff --git a/lib/core/ai/inference_error.dart b/lib/core/ai/inference_error.dart index da123ae..a8b036f 100644 --- a/lib/core/ai/inference_error.dart +++ b/lib/core/ai/inference_error.dart @@ -11,6 +11,7 @@ enum InferenceErrorCode { embedFailed, generateFailed, transcribeFailed, + ocrFailed, /// A capability seam exists but has no working implementation on this build — /// e.g. `generateStructured` (P12f forward seam): the method is defined and diff --git a/lib/core/ai/ml_kit_ocr_engine.dart b/lib/core/ai/ml_kit_ocr_engine.dart new file mode 100644 index 0000000..caf24e7 --- /dev/null +++ b/lib/core/ai/ml_kit_ocr_engine.dart @@ -0,0 +1,35 @@ +import 'package:google_mlkit_text_recognition/google_mlkit_text_recognition.dart'; +import 'package:grabbit/core/ai/inference_error.dart'; +import 'package:grabbit/core/ai/ocr_engine.dart'; + +/// Android [OcrEngine] backed by ML Kit on-device text recognition (P13b-1). +/// Uses the **bundled Latin-script model** — no Google Play Services, no +/// network, no model download — so it runs offline and fits the sideloaded +/// posture. A recognizer is created per call and closed in `finally` (OCR is +/// on-demand and infrequent, so there's no persistent native handle to leak). +class MlKitOcrEngine implements OcrEngine { + @override + bool get isAvailable => true; + + @override + Future recognizeText(String imagePath) async { + final recognizer = TextRecognizer(script: TextRecognitionScript.latin); + try { + final result = await recognizer.processImage( + InputImage.fromFilePath(imagePath), + ); + return result.text; + } on Exception catch (e) { + throw InferenceException( + InferenceErrorCode.ocrFailed, + 'Text recognition failed', + cause: e, + ); + } finally { + await recognizer.close(); + } + } + + @override + Future close() async {} +} diff --git a/lib/core/ai/ocr_engine.dart b/lib/core/ai/ocr_engine.dart new file mode 100644 index 0000000..6486d68 --- /dev/null +++ b/lib/core/ai/ocr_engine.dart @@ -0,0 +1,18 @@ +/// On-device **text recognition (OCR)** abstraction (P13b-1) — a sibling of the +/// other per-capability AI engines. Extracts text from an image file fully +/// on-device; unlike the model-based engines there's no download or device-tier +/// gating (the bundled Latin model needs no Google Play Services and runs +/// offline). Implementations are capability-gated: a host that can't run it gets +/// the graceful [UnavailableOcrEngine] no-op (never a crash — AI-SPEC §1). +abstract interface class OcrEngine { + /// Whether text recognition can run on this host (Android + bundled model). + bool get isAvailable; + + /// Recognizes text in the image at [imagePath] and returns the extracted text + /// (possibly empty — no readable text). Throws an [InferenceException] + /// (`unavailable`/`ocrFailed`) on failure. + Future recognizeText(String imagePath); + + /// Releases any native resources held by the recognizer. + Future close(); +} diff --git a/lib/core/ai/ocr_engine_factory.dart b/lib/core/ai/ocr_engine_factory.dart new file mode 100644 index 0000000..6c8f52d --- /dev/null +++ b/lib/core/ai/ocr_engine_factory.dart @@ -0,0 +1,12 @@ +import 'dart:io'; + +import 'package:grabbit/core/ai/ml_kit_ocr_engine.dart'; +import 'package:grabbit/core/ai/ocr_engine.dart'; +import 'package:grabbit/core/ai/unavailable_ocr_engine.dart'; + +/// Maps the host platform to its [OcrEngine] — the runtime "registry" seam +/// (mirrors `transcriptionEngineFor`). Android gets the real ML Kit engine; +/// every other host gets the graceful [UnavailableOcrEngine] (OCR stays off, +/// never crashes). +OcrEngine ocrEngineFor() => + Platform.isAndroid ? MlKitOcrEngine() : const UnavailableOcrEngine(); diff --git a/lib/core/ai/ocr_provider.dart b/lib/core/ai/ocr_provider.dart new file mode 100644 index 0000000..f3580e9 --- /dev/null +++ b/lib/core/ai/ocr_provider.dart @@ -0,0 +1,12 @@ +import 'package:grabbit/core/ai/ocr_engine.dart'; +import 'package:grabbit/core/ai/ocr_engine_factory.dart'; +import 'package:riverpod_annotation/riverpod_annotation.dart'; + +part 'ocr_provider.g.dart'; + +/// The [OcrEngine] for this host (P13b-1). Routes via `ocrEngineFor`; an +/// unsupported platform yields the graceful [UnavailableOcrEngine] — OCR simply +/// stays off, never crashes. No model download or device-tier gating (the +/// bundled Latin model runs offline on any Android device). +@Riverpod(keepAlive: true) +OcrEngine ocrEngine(Ref ref) => ocrEngineFor(); diff --git a/lib/core/ai/unavailable_ocr_engine.dart b/lib/core/ai/unavailable_ocr_engine.dart new file mode 100644 index 0000000..893e3e8 --- /dev/null +++ b/lib/core/ai/unavailable_ocr_engine.dart @@ -0,0 +1,24 @@ +import 'package:grabbit/core/ai/inference_error.dart'; +import 'package:grabbit/core/ai/ocr_engine.dart'; + +/// Graceful no-op [OcrEngine] for hosts that can't run ML Kit text recognition +/// (non-Android until P15, CI). Never crashes — OCR simply stays unavailable +/// (AI-SPEC §1); the "Scan text" affordance is hidden when `isAvailable` is +/// false. +class UnavailableOcrEngine implements OcrEngine { + const UnavailableOcrEngine(); + + static const _ex = InferenceException( + InferenceErrorCode.unavailable, + 'On-device text recognition is not available on this device', + ); + + @override + bool get isAvailable => false; + + @override + Future recognizeText(String imagePath) => throw _ex; + + @override + Future close() async {} +} diff --git a/lib/core/db/database.dart b/lib/core/db/database.dart index 8fd16ae..983d007 100644 --- a/lib/core/db/database.dart +++ b/lib/core/db/database.dart @@ -75,6 +75,9 @@ class MediaMetadata extends Table { // model produced it (attribution + a "Regenerate" prompt when it changes). TextColumn get aiSummary => text().nullable()(); TextColumn get aiSummaryModelId => text().nullable()(); + // P13b-1: on-device OCR text extracted from an image download. Null until the + // user scans the image; feeds full-text search (media_fts) + the embed doc. + TextColumn get ocrText => text().nullable()(); @override Set> get primaryKey => {itemId}; @@ -214,7 +217,7 @@ class AppDatabase extends _$AppDatabase { : super(executor ?? driftDatabase(name: 'grabbit')); @override - int get schemaVersion => 11; + int get schemaVersion => 12; @override MigrationStrategy get migration => MigrationStrategy( @@ -266,6 +269,23 @@ class AppDatabase extends _$AppDatabase { await m.addColumn(mediaMetadata, mediaMetadata.aiSummary); await m.addColumn(mediaMetadata, mediaMetadata.aiSummaryModelId); } + if (from < 12) { + // P13b-1: OCR text column + add it to the FTS index. FTS5 can't + // ALTER ADD COLUMN, so drop the table + triggers and let + // `_createFtsObjects` rebuild them (now with `ocr`) and backfill. + await m.addColumn(mediaMetadata, mediaMetadata.ocrText); + for (final t in const [ + 'media_fts_ai_items', + 'media_fts_au_items', + 'media_fts_ad_items', + 'media_fts_ai_meta', + 'media_fts_au_meta', + 'media_fts_ad_meta', + ]) { + await customStatement('DROP TRIGGER IF EXISTS $t'); + } + await customStatement('DROP TABLE IF EXISTS media_fts'); + } await _createIndices(); await _createFtsObjects(); }, @@ -348,64 +368,66 @@ class AppDatabase extends _$AppDatabase { Future _createFtsObjects() async { await customStatement( 'CREATE VIRTUAL TABLE IF NOT EXISTS media_fts USING fts5(' - 'item_id UNINDEXED, title, description, transcript, ' + 'item_id UNINDEXED, title, description, transcript, ocr, ' "tokenize = 'unicode61 remove_diacritics 2')", ); - // media_items → fts (title is here; description/transcript joined in). + // media_items → fts (title is here; description/transcript/ocr joined in). await customStatement( 'CREATE TRIGGER IF NOT EXISTS media_fts_ai_items ' 'AFTER INSERT ON media_items BEGIN ' 'DELETE FROM media_fts WHERE item_id = new.id; ' - 'INSERT INTO media_fts(item_id, title, description, transcript) ' + 'INSERT INTO media_fts(item_id, title, description, transcript, ocr) ' 'SELECT new.id, new.title, ' '(SELECT description FROM media_metadata WHERE item_id = new.id), ' - '(SELECT transcript FROM media_metadata WHERE item_id = new.id); END', + '(SELECT transcript FROM media_metadata WHERE item_id = new.id), ' + '(SELECT ocr_text FROM media_metadata WHERE item_id = new.id); END', ); await customStatement( 'CREATE TRIGGER IF NOT EXISTS media_fts_au_items ' 'AFTER UPDATE OF title ON media_items BEGIN ' 'DELETE FROM media_fts WHERE item_id = new.id; ' - 'INSERT INTO media_fts(item_id, title, description, transcript) ' + 'INSERT INTO media_fts(item_id, title, description, transcript, ocr) ' 'SELECT new.id, new.title, ' '(SELECT description FROM media_metadata WHERE item_id = new.id), ' - '(SELECT transcript FROM media_metadata WHERE item_id = new.id); END', + '(SELECT transcript FROM media_metadata WHERE item_id = new.id), ' + '(SELECT ocr_text FROM media_metadata WHERE item_id = new.id); END', ); await customStatement( 'CREATE TRIGGER IF NOT EXISTS media_fts_ad_items ' 'AFTER DELETE ON media_items BEGIN ' 'DELETE FROM media_fts WHERE item_id = old.id; END', ); - // media_metadata → fts (description/transcript here; title joined in). + // media_metadata → fts (description/transcript/ocr here; title joined in). await customStatement( 'CREATE TRIGGER IF NOT EXISTS media_fts_ai_meta ' 'AFTER INSERT ON media_metadata BEGIN ' 'DELETE FROM media_fts WHERE item_id = new.item_id; ' - 'INSERT INTO media_fts(item_id, title, description, transcript) ' + 'INSERT INTO media_fts(item_id, title, description, transcript, ocr) ' 'SELECT new.item_id, ' '(SELECT title FROM media_items WHERE id = new.item_id), ' - 'new.description, new.transcript; END', + 'new.description, new.transcript, new.ocr_text; END', ); await customStatement( 'CREATE TRIGGER IF NOT EXISTS media_fts_au_meta ' - 'AFTER UPDATE OF description, transcript ON media_metadata BEGIN ' + 'AFTER UPDATE OF description, transcript, ocr_text ON media_metadata BEGIN ' 'DELETE FROM media_fts WHERE item_id = new.item_id; ' - 'INSERT INTO media_fts(item_id, title, description, transcript) ' + 'INSERT INTO media_fts(item_id, title, description, transcript, ocr) ' 'SELECT new.item_id, ' '(SELECT title FROM media_items WHERE id = new.item_id), ' - 'new.description, new.transcript; END', + 'new.description, new.transcript, new.ocr_text; END', ); await customStatement( 'CREATE TRIGGER IF NOT EXISTS media_fts_ad_meta ' 'AFTER DELETE ON media_metadata BEGIN ' 'DELETE FROM media_fts WHERE item_id = old.item_id; ' - 'INSERT INTO media_fts(item_id, title, description, transcript) ' - 'SELECT old.item_id, title, NULL, NULL FROM media_items ' + 'INSERT INTO media_fts(item_id, title, description, transcript, ocr) ' + 'SELECT old.item_id, title, NULL, NULL, NULL FROM media_items ' 'WHERE id = old.item_id; END', ); // One-time backfill of pre-existing rows (no-op on a fresh, empty DB). await customStatement( - 'INSERT INTO media_fts(item_id, title, description, transcript) ' - 'SELECT mi.id, mi.title, mm.description, mm.transcript ' + 'INSERT INTO media_fts(item_id, title, description, transcript, ocr) ' + 'SELECT mi.id, mi.title, mm.description, mm.transcript, mm.ocr_text ' 'FROM media_items mi ' 'LEFT JOIN media_metadata mm ON mm.item_id = mi.id ' 'WHERE mi.id NOT IN (SELECT item_id FROM media_fts)', diff --git a/lib/core/graph/embedding_doc.dart b/lib/core/graph/embedding_doc.dart index b584715..5a82567 100644 Binary files a/lib/core/graph/embedding_doc.dart and b/lib/core/graph/embedding_doc.dart differ diff --git a/lib/features/library/data/metadata_repository.dart b/lib/features/library/data/metadata_repository.dart index fc1ea5f..5a03801 100644 --- a/lib/features/library/data/metadata_repository.dart +++ b/lib/features/library/data/metadata_repository.dart @@ -674,6 +674,18 @@ class MetadataRepository { ); } + /// Stores the on-device OCR [text] extracted from an image (P13b-1). Upserts + /// (only this column) so it works for items whose metadata row predates the + /// column; passing `null` clears it. The `media_fts` triggers reindex it for + /// full-text search automatically. + Future updateOcrText(String itemId, String? text) async { + await _db + .into(_db.mediaMetadata) + .insertOnConflictUpdate( + MediaMetadataCompanion.insert(itemId: itemId, ocrText: Value(text)), + ); + } + // --- Tags --- Stream> watchTagsForItem(String itemId) { diff --git a/lib/features/library/presentation/item_detail_screen.dart b/lib/features/library/presentation/item_detail_screen.dart index 8a91678..cf6e8e0 100644 --- a/lib/features/library/presentation/item_detail_screen.dart +++ b/lib/features/library/presentation/item_detail_screen.dart @@ -6,6 +6,7 @@ import 'package:flutter_riverpod/flutter_riverpod.dart'; import 'package:go_router/go_router.dart'; import 'package:grabbit/core/ai/generation_provider.dart'; import 'package:grabbit/core/ai/inference_error.dart'; +import 'package:grabbit/core/ai/ocr_provider.dart'; import 'package:grabbit/core/ai/transcription_engine.dart'; import 'package:grabbit/core/ai/transcription_model.dart'; import 'package:grabbit/core/ai/transcription_provider.dart'; @@ -274,6 +275,7 @@ class _ItemBodyState extends State<_ItemBody> { _DetailChips(item: item), _AiSummarySection(itemId: item.id), _SummarySection(itemId: item.id), + if (item.type == 'image') _OcrSection(item: item), _MetadataSection(itemId: item.id), _TranscriptSection( itemId: item.id, @@ -688,6 +690,122 @@ class _AiSummarySectionState extends ConsumerState<_AiSummarySection> { } } +/// On-device OCR (P13b-1) for image items: a "Scan text" action that extracts +/// text from the image with ML Kit (bundled Latin model, fully offline), caches +/// it (`MediaMetadata.ocrText`), shows it, and feeds full-text search. Hidden +/// when the engine can't run on this host (non-Android). +class _OcrSection extends ConsumerStatefulWidget { + const _OcrSection({required this.item}); + final MediaItem item; + + @override + ConsumerState<_OcrSection> createState() => _OcrSectionState(); +} + +class _OcrSectionState extends ConsumerState<_OcrSection> { + bool _busy = false; + String? _error; + bool _noText = false; // the last scan found nothing readable + + Future _scan() async { + final engine = ref.read(ocrEngineProvider); + final repo = ref.read(metadataRepositoryProvider); + setState(() { + _busy = true; + _error = null; + _noText = false; + }); + try { + final text = (await engine.recognizeText(widget.item.filePath)).trim(); + if (text.isEmpty) { + if (mounted) setState(() => _noText = true); + } else { + await repo.updateOcrText(widget.item.id, text); + } + } on InferenceException catch (e) { + if (mounted) setState(() => _error = "Couldn't scan text — ${e.message}"); + } finally { + if (mounted) setState(() => _busy = false); + } + } + + @override + Widget build(BuildContext context) { + final theme = Theme.of(context); + final tokens = GrabBitTokens.of(context); + // Engine can't run here (non-Android) → nothing to offer. + if (!ref.watch(ocrEngineProvider).isAvailable) { + return const SizedBox.shrink(); + } + final meta = ref + .watch(metadataForItemProvider(widget.item.id)) + .asData + ?.value; + final cached = meta?.ocrText; + final hasText = cached != null && cached.trim().isNotEmpty; + + return Padding( + padding: EdgeInsets.only(top: tokens.spaceMd), + child: Column( + crossAxisAlignment: CrossAxisAlignment.start, + children: [ + Row( + children: [ + Icon( + Icons.document_scanner_outlined, + size: 18, + color: theme.colorScheme.primary, + ), + SizedBox(width: tokens.spaceXs), + Text('Text in image', style: theme.textTheme.titleSmall), + const Spacer(), + if (_busy) + const SizedBox( + width: 16, + height: 16, + child: CircularProgressIndicator(strokeWidth: 2), + ) + else + TextButton( + onPressed: _scan, + child: Text(hasText ? 'Rescan' : 'Scan text'), + ), + ], + ), + if (hasText) ...[ + SizedBox(height: tokens.spaceXs), + Text(cached, style: theme.textTheme.bodyMedium), + Text( + 'Recognized on-device', + style: theme.textTheme.bodySmall?.copyWith( + color: theme.colorScheme.onSurfaceVariant, + ), + ), + ] else if (!_busy) + Text( + _noText + ? 'No readable text found in this image.' + : 'Find and search text inside this image, on-device.', + style: theme.textTheme.bodySmall?.copyWith( + color: theme.colorScheme.onSurfaceVariant, + ), + ), + if (_error != null) + Padding( + padding: EdgeInsets.only(top: tokens.spaceXs), + child: Text( + _error!, + style: theme.textTheme.bodySmall?.copyWith( + color: theme.colorScheme.error, + ), + ), + ), + ], + ), + ); + } +} + /// Curated caption languages offered by the on-demand fetch (P10f-2). The /// in-app language is always shown (prepended if missing) and pre-selected. const List<({String code, String name})> _captionLanguages = [ diff --git a/pubspec.lock b/pubspec.lock index bb2fe4b..084d28f 100644 --- a/pubspec.lock +++ b/pubspec.lock @@ -600,6 +600,22 @@ packages: url: "https://pub.dev" source: hosted version: "17.2.3" + google_mlkit_commons: + dependency: transitive + description: + name: google_mlkit_commons + sha256: "3e69fea4211727732cc385104e675ad1e40b29f12edd492ee52fa108423a6124" + url: "https://pub.dev" + source: hosted + version: "0.11.1" + google_mlkit_text_recognition: + dependency: "direct main" + description: + name: google_mlkit_text_recognition + sha256: "10a8d1a64fa21efda89514f6df88bbb7885be3ae1bfb7a6d229132d802a20d22" + url: "https://pub.dev" + source: hosted + version: "0.15.1" graphs: dependency: transitive description: diff --git a/pubspec.yaml b/pubspec.yaml index 8085f64..81fbf45 100644 --- a/pubspec.yaml +++ b/pubspec.yaml @@ -75,6 +75,11 @@ dependencies: # companion is intentionally omitted (it would clash with ffmpeg_kit_flutter_new). whisper_ggml_plus: ^1.5.2 + # On-device AI (P13b-1): ML Kit on-device text recognition (OCR) for image + # downloads. MIT plugin; the bundled Latin model needs no Google Play Services + # and runs fully offline — fits the sideloaded/de-Googled posture. + google_mlkit_text_recognition: ^0.15.1 + dev_dependencies: flutter_test: sdk: flutter diff --git a/test/core/ai/ocr_engine_test.dart b/test/core/ai/ocr_engine_test.dart new file mode 100644 index 0000000..5615823 --- /dev/null +++ b/test/core/ai/ocr_engine_test.dart @@ -0,0 +1,34 @@ +import 'package:flutter_test/flutter_test.dart'; +import 'package:grabbit/core/ai/inference_error.dart'; +import 'package:grabbit/core/ai/ocr_engine_factory.dart'; +import 'package:grabbit/core/ai/unavailable_ocr_engine.dart'; + +void main() { + group('OCR engine (P13b-1)', () { + test('factory returns the graceful no-op off Android (the test host)', () { + // ocrEngineFor() picks the platform engine; on the CI/test host (not + // Android) that's the UnavailableOcrEngine. + final engine = ocrEngineFor(); + expect(engine.isAvailable, isFalse); + }); + + test( + 'UnavailableOcrEngine reports unavailable and throws on use', + () async { + const engine = UnavailableOcrEngine(); + expect(engine.isAvailable, isFalse); + expect( + () => engine.recognizeText('/some/image.jpg'), + throwsA( + isA().having( + (e) => e.code, + 'code', + InferenceErrorCode.unavailable, + ), + ), + ); + await engine.close(); // no-op, must not throw + }, + ); + }); +} diff --git a/test/core/db/database_test.dart b/test/core/db/database_test.dart index d71acac..15586ca 100644 --- a/test/core/db/database_test.dart +++ b/test/core/db/database_test.dart @@ -9,8 +9,8 @@ void main() { setUp(() => db = AppDatabase(NativeDatabase.memory())); tearDown(() => db.close()); - test('opens at schema version 11 with all tables created', () async { - expect(db.schemaVersion, 11); + test('opens at schema version 12 with all tables created', () async { + expect(db.schemaVersion, 12); // Forces onCreate (createAll) + beforeOpen to run. final tableNames = db.allTables.map((t) => t.actualTableName).toSet(); @@ -899,6 +899,142 @@ void main() { }, ); + test( + 'upgrades a v11 database to v12, adding ocrText + the FTS ocr column (P13b-1)', + () async { + // Seed a v11-schema DB: media_metadata has the ai_summary columns but no + // ocr_text, and an old 4-column media_fts. Opening at v12 must add the + // column and rebuild media_fts WITH `ocr`. + final upgraded = AppDatabase( + NativeDatabase.memory( + setup: (raw) { + raw.execute(''' + CREATE TABLE media_items ( + id TEXT NOT NULL PRIMARY KEY, + title TEXT NOT NULL, + source_url TEXT NOT NULL, + site TEXT NOT NULL, + file_path TEXT NOT NULL, + type TEXT NOT NULL, + duration_sec INTEGER, + size_bytes INTEGER, + width INTEGER, + height INTEGER, + thumb_path TEXT, + created_at INTEGER NOT NULL, + storage_state TEXT NOT NULL, + notes TEXT, + folder_id INTEGER, + is_favorite INTEGER NOT NULL DEFAULT 0, + content_hash TEXT, + last_accessed_at INTEGER + )'''); + raw.execute(''' + CREATE TABLE media_metadata ( + item_id TEXT NOT NULL PRIMARY KEY REFERENCES media_items (id), + uploader TEXT, + upload_date INTEGER, + description TEXT, + original_url TEXT, + uploader_id TEXT, + channel_id TEXT, + source_id TEXT, + playlist_id TEXT, + playlist_title TEXT, + tags TEXT, + transcript TEXT, + transcript_cues TEXT, + ai_summary TEXT, + ai_summary_model_id TEXT + )'''); + // The v11 FTS table — note: NO `ocr` column. + raw.execute( + 'CREATE VIRTUAL TABLE media_fts USING fts5(' + 'item_id UNINDEXED, title, description, transcript, ' + "tokenize = 'unicode61 remove_diacritics 2')", + ); + raw.execute(''' + CREATE TABLE download_tasks ( + id TEXT NOT NULL PRIMARY KEY, + url TEXT NOT NULL, + request_json TEXT NOT NULL, + status TEXT NOT NULL, + progress REAL NOT NULL DEFAULT 0, + error_code TEXT, + retries INTEGER NOT NULL DEFAULT 0, + created_at INTEGER NOT NULL, + order_index INTEGER NOT NULL DEFAULT 0 + )'''); + raw.execute(''' + CREATE TABLE app_settings ( + id INTEGER NOT NULL PRIMARY KEY DEFAULT 0, + data TEXT NOT NULL + )'''); + raw.execute(''' + CREATE TABLE notifications ( + id TEXT NOT NULL PRIMARY KEY, + category TEXT NOT NULL, + severity TEXT NOT NULL, + title TEXT NOT NULL, + body TEXT, + target_route TEXT, + item_id TEXT, + task_id TEXT, + dedupe_key TEXT, + created_at INTEGER NOT NULL, + updated_at INTEGER NOT NULL, + read_at INTEGER, + expires_at INTEGER, + coalesce_count INTEGER NOT NULL DEFAULT 1 + )'''); + raw.execute(''' + CREATE TABLE things ( + id TEXT NOT NULL PRIMARY KEY, + type TEXT NOT NULL, + jsonld TEXT NOT NULL, + name TEXT, + url TEXT, + created_at INTEGER NOT NULL, + updated_at INTEGER NOT NULL + )'''); + raw.execute( + 'INSERT INTO media_items (id, title, source_url, site, ' + 'file_path, type, created_at, storage_state) VALUES ' + "('old1', 'Poster', 'https://x/p', 'web', '/m/old1.jpg', " + "'image', 0, 'private')", + ); + raw.execute( + 'INSERT INTO media_metadata (item_id, description) VALUES ' + "('old1', 'a description')", + ); + raw.execute('PRAGMA user_version = 11'); + }, + ), + ); + addTearDown(upgraded.close); + + // The new column exists, defaults null, and prior data survives. + final meta = await upgraded.select(upgraded.mediaMetadata).getSingle(); + expect(meta.itemId, 'old1'); + expect(meta.ocrText, isNull); + expect(meta.description, 'a description'); + + // media_fts now has `ocr`: writing ocr_text makes the item findable by a + // word that appears only in the OCR text. + await (upgraded.update( + upgraded.mediaMetadata, + )..where((t) => t.itemId.equals('old1'))).write( + const MediaMetadataCompanion(ocrText: Value('GRAND OPENING saturday')), + ); + final hits = await upgraded + .customSelect( + "SELECT item_id FROM media_fts WHERE media_fts MATCH 'saturday'", + ) + .get(); + expect(hits.map((r) => r.read('item_id')), ['old1']); + }, + ); + test( 'addColumnIfMissing is idempotent and adds only absent columns', () async { diff --git a/test/core/graph/embedding_doc_test.dart b/test/core/graph/embedding_doc_test.dart index 3628f66..5747305 100644 --- a/test/core/graph/embedding_doc_test.dart +++ b/test/core/graph/embedding_doc_test.dart @@ -44,6 +44,19 @@ void main() { expect(docs.single.text, contains('How to cook pasta well')); }); + test('includes the OCR text so semantic search covers it (P13b-1)', () { + final docs = buildEmbeddingDocs( + LibrarySnapshot( + media: [item('a', title: 'Poster')], + metadata: [ + const MediaMetadataData(itemId: 'a', ocrText: 'GRAND OPENING'), + ], + ), + modelId: 'm1', + ); + expect(docs.single.text, contains('GRAND OPENING')); + }); + test('includes the transcript, capped to the window (P10g-1)', () { final long = List.filled(2000, 'word').join(' '); // ~10k chars final docs = buildEmbeddingDocs( diff --git a/test/features/library/metadata_repository_test.dart b/test/features/library/metadata_repository_test.dart index 20511aa..b9c447b 100644 --- a/test/features/library/metadata_repository_test.dart +++ b/test/features/library/metadata_repository_test.dart @@ -460,6 +460,26 @@ void main() { }, ); + test('updateOcrText upserts then clears + feeds FTS (P13b-1)', () async { + await seed('a', 'Poster', 'image'); + await repo.updateOcrText('a', 'some recognized text'); + var meta = await (db.select( + db.mediaMetadata, + )..where((m) => m.itemId.equals('a'))).getSingle(); + expect(meta.ocrText, 'some recognized text'); + expect( + (await repo.watchFiltered(const LibraryQuery(search: 'recognized')).first) + .map((r) => r.id), + ['a'], + ); + + await repo.updateOcrText('a', null); + meta = await (db.select( + db.mediaMetadata, + )..where((m) => m.itemId.equals('a'))).getSingle(); + expect(meta.ocrText, isNull); + }); + test('findItemByUrl matches with tracking params stripped (P9b-4)', () async { await db .into(db.mediaItems) @@ -618,6 +638,17 @@ void main() { expect(rows.map((r) => r.id), ['a']); }); + test('search matches a word only in the OCR text (P13b-1)', () async { + await seed('a', 'Poster', 'image'); + await seed('b', 'Flyer', 'image'); + await repo.updateOcrText('a', 'GRAND OPENING saturday'); + await repo.updateOcrText('b', 'closed for renovation'); + final rows = await repo + .watchFiltered(const LibraryQuery(search: 'saturday')) + .first; + expect(rows.map((r) => r.id), ['a']); + }); + test('relevance ranks stronger matches above newer ones (P10h)', () async { // 'a' is newer but matches 'forest' only once (in its description); // 'b' is older but matches it repeatedly in the title.