fix(propose-takes): insert sentinel rows for zero-proposal pages to avoid re-extraction every cycle#2261
Open
LittleRandom wants to merge 4 commits into
Open
Conversation
…void re-extraction every cycle (garrytan#2106) The propose_takes idempotency cache only writes rows when the LLM extractor returns gradeable claims. Pages that return [] (the common case for infrastructure/reference/memory pages) never get a cache entry, so they are re-extracted on every dream cycle — wasting ~85-90% of LLM calls and defeating per-cycle budget caps. Fix: when the extractor returns an empty array, insert a sentinel row with claim_text='__no_proposals__' so the composite unique index on (source_id, page_slug, content_hash, prompt_version) catches it on future cycles. If page content changes, content_hash changes → cache miss → legitimate re-extraction.
Moved from kimori/charts/gbrain/docker/ so the container build lives alongside the source it packages. The Dockerfile now uses COPY instead of git clone since the build context is the repo itself.
This reverts commit a3538ff.
These are local container build artifacts — they live on disk for local docker build but shouldn't be tracked in the source repo.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2106
Problem
propose_takesclaims an idempotency contract — "an unchanged page never re-spends LLM tokens" — but this only holds for pages that produce ≥1 take. A page yielding zero proposals never gets atake_proposalsrow, so its idempotency key is never stored. It is re-sent to the LLM every cycle it falls inside the scan window.In a steady brain, the majority of pages have no takeable claims (concept pages, stubs, transcripts, reference), so the phase spends ~85-90% of its tokens re-extracting pages that will never produce a take.
Fix
Insert a sentinel row with
claim_text = '__no_proposals__'when the extractor returns[]. The composite unique index on(source_id, page_slug, content_hash, prompt_version)catches it on future cycles:Verification
Before fix: 89 pages scanned, 0 cached, 1060 seconds
After fix: 94 pages scanned, 93 cached, 6.8 seconds total cycle
~156x faster. Verified on a production brain with PostgreSQL engine.