Skip to content

fix(propose-takes): insert sentinel rows for zero-proposal pages to avoid re-extraction every cycle#2261

Open
LittleRandom wants to merge 4 commits into
garrytan:masterfrom
LittleRandom:fix/propose-takes-negative-cache
Open

fix(propose-takes): insert sentinel rows for zero-proposal pages to avoid re-extraction every cycle#2261
LittleRandom wants to merge 4 commits into
garrytan:masterfrom
LittleRandom:fix/propose-takes-negative-cache

Conversation

@LittleRandom

Copy link
Copy Markdown

Closes #2106

Problem

propose_takes claims an idempotency contract — "an unchanged page never re-spends LLM tokens" — but this only holds for pages that produce ≥1 take. A page yielding zero proposals never gets a take_proposals row, so its idempotency key is never stored. It is re-sent to the LLM every cycle it falls inside the scan window.

In a steady brain, the majority of pages have no takeable claims (concept pages, stubs, transcripts, reference), so the phase spends ~85-90% of its tokens re-extracting pages that will never produce a take.

Fix

Insert a sentinel row with claim_text = '__no_proposals__' when the extractor returns []. The composite unique index on (source_id, page_slug, content_hash, prompt_version) catches it on future cycles:

  • Page unchanged → same content_hash → SELECT finds the sentinel → cache hit → skip LLM call
  • Page edited → new content_hash → SELECT misses → re-extract (legitimate)
  • Prompt version bumped → new prompt_version → full re-scan (fresh prompt)

Verification

Before fix: 89 pages scanned, 0 cached, 1060 seconds
After fix: 94 pages scanned, 93 cached, 6.8 seconds total cycle

~156x faster. Verified on a production brain with PostgreSQL engine.

…void re-extraction every cycle (garrytan#2106)

The propose_takes idempotency cache only writes rows when the LLM
extractor returns gradeable claims. Pages that return [] (the common
case for infrastructure/reference/memory pages) never get a cache entry,
so they are re-extracted on every dream cycle — wasting ~85-90% of LLM
calls and defeating per-cycle budget caps.

Fix: when the extractor returns an empty array, insert a sentinel row
with claim_text='__no_proposals__' so the composite unique index on
(source_id, page_slug, content_hash, prompt_version) catches it on
future cycles. If page content changes, content_hash changes → cache
miss → legitimate re-extraction.
Moved from kimori/charts/gbrain/docker/ so the container build lives
alongside the source it packages. The Dockerfile now uses COPY instead
of git clone since the build context is the repo itself.
These are local container build artifacts — they live on disk for local
docker build but shouldn't be tracked in the source repo.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

propose_takes re-extracts every zero-take page on every cycle — no negative-result cache

1 participant