You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Status: Parked. Resume after #3337 (refactor(automations): use DBOS for scheduling and firing) merges.
That PR replaces the in-process cron worker, JetStream job queue, and semaphore with DBOS workflows. The Observer in this design is an async worker — riding DBOS rather than building parallel async infra is the right path.
The task-list / TodoWrite-style feature has been designed and shipped independently of this (see the matching todo_write spec/PR).
Goals (ranked)
A. Threads never hit the model context ceiling. Hard requirement.
D. Multi-day thread continuity. A user returning on day 14 finds the agent coherent.
Observational log — append-only markdown observations written by an async Observer (cheap model, e.g. Gemini Flash or Haiku). Triggered when unobserved-tokens > 30k. Sync fallback at 1.2× threshold to prevent overflow. When the log itself grows past ~40k tokens, a reflection pass compacts older observations into denser ones (two-level hierarchy).
Typed ProjectState sidecar — small Zod-schemaed snapshot extracted from observations during reflection. Approximate shape:
ProjectState is queryable and human-inspectable in the UI. It is derived from observations, not authored independently — single source of truth.
Task list — decoupled from this design, model-managed via the todo_write tool (separate PR), per-thread, ephemeral, chat-UI-rendered.
Per-turn prompt assembly
Ordered stable → mutable for prompt-cache reuse:
1. system prompts (cached)
2. ProjectState (cached until next reflection)
3. Observations + reflections (cached, append-only)
4. Open task list (small mutable)
5. Last K raw turns (K ≈ 10–20; was 50)
6. New user message
Anthropic prompt caching benefits 1–3 directly. Stages 4–6 are small.
Why hybrid over pure 4
Pure observational log is prose. Goal D ("user returns on day 14") is partly a UX problem — the user wants to see what the agent remembers. A typed ProjectState panel makes that legible.
Multi-provider via AI SDK means a typed sidecar survives model swaps mid-thread; a prose summary's style does not.
Extracting ProjectState only during reflection (not every observation) keeps the extractor cost amortized.
Why async observer matters
No latency tax on user turns. Compaction is invisible.
Burst handling via sync fallback at 1.2× threshold preserves goal A.
New DBOS workflow: observation-worker subscribed to message-saved events, debounced per thread, threshold-gated.
New cheap-model provider configuration: which model the Observer uses (default Gemini 2.5 Flash, configurable per org).
Open questions to resolve on resume
Exact thresholds (30k / 40k are Mastra defaults — validate against decopilot's typical thread shape).
Tool-call output handling: SWE-agent-style elision of stale tool_result blocks (replace with <elided n_tokens=...> placeholders + path-keyed cache for file reads). How aggressive?
Cross-thread memory: out of scope here, but worth a brief note on whether ProjectState should escape thread boundary later (e.g. per-project facts).
Cost ceiling for the Observer (per-org rate limit?).
Failure modes: Observer crashes → fall back to raw window + hard ceiling guard.
Status: Parked. Resume after #3337 (
refactor(automations): use DBOS for scheduling and firing) merges.That PR replaces the in-process cron worker, JetStream job queue, and semaphore with DBOS workflows. The Observer in this design is an async worker — riding DBOS rather than building parallel async infra is the right path.
The task-list / TodoWrite-style feature has been designed and shipped independently of this (see the matching
todo_writespec/PR).Goals (ranked)
Current state of decopilot
apps/mesh/src/api/routes/decopilot/memory.ts,DEFAULT_WINDOW_SIZE).convertToModelMessages→streamText(apps/mesh/src/api/routes/decopilot/stream-core.ts).apps/docs/.../decopilot/context.mdx) sketch a 6-slot context layout and a 40/80 rule, marked as intended-not-implemented.Options surveyed
Lineage: Claude Code
/compact, Aider repo-map, SWE-agent observation elision, Anthropic memory tool + context editing, Letta/MemGPT (Packer 2023), CoALA (Sumers 2024), Reflexion (Shinn 2023), Mastra Observational Memory (2025), "Lost in the middle" (Liu 2024), recursive summarization (Wu 2021).Selected design: hybrid 4 + 3
Per-thread durable artifacts:
ProjectStatesidecar — small Zod-schemaed snapshot extracted from observations during reflection. Approximate shape:ProjectStateis queryable and human-inspectable in the UI. It is derived from observations, not authored independently — single source of truth.todo_writetool (separate PR), per-thread, ephemeral, chat-UI-rendered.Per-turn prompt assembly
Ordered stable → mutable for prompt-cache reuse:
Anthropic prompt caching benefits 1–3 directly. Stages 4–6 are small.
Why hybrid over pure 4
ProjectStatepanel makes that legible.ProjectStateonly during reflection (not every observation) keeps the extractor cost amortized.Why async observer matters
Replaced/changed surface
apps/mesh/src/api/routes/decopilot/memory.ts—loadHistory(50)→loadRecentMessages(K)+ newloadObservations(threadId)+loadProjectState(threadId).apps/mesh/src/api/routes/decopilot/stream-core.ts— prompt assembly updated; emit "message-saved" events for the Observer.thread_observations(id, thread_id, kind 'observation'|'reflection', content, tokens_covered, created_at)thread_project_state(thread_id PK, state_jsonb, updated_at)observation-workersubscribed to message-saved events, debounced per thread, threshold-gated.Open questions to resolve on resume
tool_resultblocks (replace with<elided n_tokens=...>placeholders + path-keyed cache for file reads). How aggressive?ProjectStateshould escape thread boundary later (e.g. per-project facts).Out of scope
ProjectState(Observer is authoritative).todo_writefeature (separate PR).Resume checklist
When #3337 lands:
SELECT thread_id, SUM(tokens) FROM thread_messages GROUP BY thread_id).