feat: retrieval-only fallback + tier-aware history depth (P13d-3, closes P13d)#147
Merged
Merged
Conversation
Closes the flagship "Ask your library". Low/ineligible tiers (no generation model) now reach an ephemeral RelevantItemsScreen (/ask/relevant) that surfaces the most relevant library items via semantic retrieval — no LLM, nothing persisted, with an on-ramp when Smart search isn't ready. The Dashboard entry shows whenever the graph is available + the library is non-empty, routing capable tiers to the chat and low tiers to the fallback. History depth is now tier-aware: historyBudgetForTier(DeviceTier) (low/mid 1000, high 3000) feeds AskController's per-turn retrieval budget — the concrete RAM lever for the LLM + Cozo HNSW co-residency carried from P12d-2 (real-hardware validation owed via APK). No schema, no deps. https://claude.ai/code/session_013JoYmLCosYt5tQ8qwdbL1T
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
P13d-3 — Low-tier retrieval-only fallback + tier-aware history depth + RAM co-residency
The final P13d slice — closes the flagship "Ask your library". Until now the feature hid entirely on low/ineligible tiers (the entry hard-required a generation model), even though retrieval (embedder + Cozo HNSW) works there. No schema change; no new deps.
1. Retrieval-only fallback
AskEntryTilenow shows whenevergraphStore.isAvailable+ the library is non-empty, routing by tier: capable →/ask(chat), low/ineligible →/ask/relevant(subtitle adapts to "Find the most relevant items").RelevantItemsScreen(/ask/relevant): a query →semanticResultsProvider(embed → HNSW vector search → hydrate) → the most relevant items in the existingMediaGrid(tap → item). Fully ephemeral (no chats persisted), framed clearly ("this device shows relevant items rather than a written answer"), with an on-ramp to AI settings when Smart search isn't ready.2. Tier-aware history depth
historyBudgetForTier(DeviceTier)(ask_chat.dart): low/mid → 1000, high → 3000 (replaces the d-1 default 1500).AskController.sendreadsactiveDeviceTierProviderand passeshistoryCharBudget: historyBudgetForTier(tier)into each turn'sretrieve(...)— a shallower window for memory-constrained mid devices.3. RAM co-residency (carried from P12d-2)
k: 30/maxSources: 6stay as modest secondary levers. Real-hardware co-residency (LLM + live HNSW index) is the APK spot-check in VERIFICATION. The two BACKLOG entries for this are marked resolved/verified at P13d-3.Tests (CI; native generation/RAM is APK-verified)
historyBudgetForTiertruth table.AskControllerpasses the tier-scaled budget (mid vs high) —FakeRagRetrievercaptures it; tier pinned via a fakeActiveDeviceTier.AskEntryTile: low tier (no model) → shows + targets/ask/relevant; capable →/ask; still hides when graph unavailable / library empty.RelevantItemsScreen: results render; empty → "No matching items"; not-ready → on-ramp.dart format+flutter analyzeclean.Docs
P13-PLAN.md— d-3[~]+ parentP13dflipped[x](flagship complete).VERIFICATION.md— P13d-3: low-end retrieval-only flow; tier-aware depth; RAM co-residency on real low/mid hardware.BACKLOG.md— both RAM co-residency entries resolved; new note to decouple Ask from thesemanticSearchEnabledopt-in (pre-existing, surfaced here).Verification
This closes the flagship P13d. Next top-level slice is P13e (advanced graph analytics & viz), planned separately.
https://claude.ai/code/session_013JoYmLCosYt5tQ8qwdbL1T
Generated by Claude Code