feat: chat A2A inner loop, council routing, compaction authority (3/3)#200
feat: chat A2A inner loop, council routing, compaction authority (3/3)#200mdear wants to merge 18 commits into
Conversation
- Docker Compose local stack with PostgreSQL, Redis, MinIO, sandbox - Local sandbox entrypoint, VNC, browser automation services - Stack control scripts (stack_control.sh, local/*) - Backend Dockerfile + entrypoint for local development - Configuration: .stack.env.local, settings.yaml, model_configs - SQLAlchemy model fixes (UUID consistency, TimestampColumn) - Agent tool/runtime improvements (reasoning_content, field renames) - Credit billing_enabled toggle + usage handler refactor - E2B sandbox management, VNC URL support - 246 tests (unit, integration, smoke, E2E) - Documentation: architecture, getting-started, local-docker-sandbox - GitHub Copilot instructions and prompt templates
- A2A protocol: adapter server, backends (Copilot, Claude Code, Codex) - Agent inner loop: strategy pattern, tool bridge, routing - A2A billing: backend-aware credit calculation, provider-reported strategies - Circuit breaker, event stream adapter, multimodal support - Agent factory: inner loop strategy builder, converter - Health endpoint: A2A mode fields - CreditUsageHandler: A2A billing strategies - Config: A2A agent settings (inner_loop_mode, a2a_backend, billing) - 26 A2A agent tests + 10 billing strategy tests - 17 A2A design/implementation/runtime docs
|
I am continuing testing on this branch feature/a2a-chat-inner-loop_3_of_3. |
A2A/Copilot flow: - validate chat and agent model steering end-to-end through the Copilot runtime - harden adapter/session error handling, council fallback, and post-turn event draining Frontend and local UX: - keep multiline composer input visible on mobile and tighten settings state handling - refine local stack/build helpers and sandbox port configuration for faster iteration Quality and docs: - expand unit and E2E coverage, refresh the test plan, and capture implementation notes
|
This PR is 3 of a series of 3 PRs: #198 — #198: Docker sandbox runtime, local deploy stack, session lifecycle, frontend, test overhaul (389 files) #199 — #199: A2A inner loop strategy, backend registry, billing strategies, adapter server (74 incremental files) #200 — #200: Chat A2A turn loop, council A2A routing, cross-authority compaction (16 incremental files) |
… storage proxy fix Chat A2A image retention: - Add extract_historical_image_parts() to rehydrate prior-turn images - Integrate into adapter_server event source for multi-turn continuity - 83 multimodal unit tests, 54 adapter server tests, 60 turn loop tests Sandbox lifecycle hardening: - 6-phase orphan cleanup pipeline (soft-delete, orphan kill, stale pause, zombie removal, volume cleanup, timeout enforcement) - Per-sandbox DB isolation (R2), conditional state marking (R1), persistent timeout_at column (R6) - Alembic migration for sandbox timeout_at and FK constraints - Design docs: lifecycle assessment, accumulation root-cause analysis Storage proxy fix: - Add Content-Length header to proxy_download (was chunked-only) Frontend polish: - Mobile composer scroll-into-view, model tag theming, settings typing Test results: - Unit: 5758 passed, 0 failed, 0 errors, 0 skipped - E2E: 39 passed, 0 failed, 0 errors, 0 skipped
- Add Chat A2A adapter sidecar topology (sandbox-independent) - Add claude-opus-4-7 as system model (pricing, context, frontend) - Add A2A backend-specific timeout configuration - Add A2AAdapterUnavailableError (HTTP 503) exception - Harden sandbox orphan cleanup (R4 zombie, volume, timeout) - Enrich /health endpoint with A2A inner-loop diagnostics - Improve stack_control.sh build vs rebuild help clarity - Add startup validation for chat A2A strict mode - Add retry classification, thinking-temperature, inner-loop parity tests - Add design docs: sidecar deployment, URL resolution, billing
… compaction lock fix
- Plumb agent_kind through IIAgent -> sandbox metadata -> DockerSandbox._a2a_adapter_env
with AgentType enum validation (new _agent_kind_from_name helper)
- Add AgentSettings.a2a_adapter_timeout_long_horizon (3600s) and
a2a_adapter_long_horizon_agent_kinds ({deep_research}) overrides
- Opus 4.7 adaptive thinking: drop manual thinking block (Anthropic rejects it
with HTTP 400 on Opus 4.7); detect via _is_opus_4_7_or_later
- Fix compaction lock leak in inner_loop: acquire + yield moved inside try so
consumer aclose() cannot bypass release
- stack_control.sh: add verify subcommand + sha256 manifest
- CODEMAPS: refresh architecture.md and dependencies.md
- Tests: 24 parametrized tests for _agent_kind_from_name, extend
TestA2AAdapterEnv with long-horizon override cases, adapter/orphan/R4 tweaks
- Reorder factory priority so deferred-sandbox path always wins over static a2a_agent_url for agent sessions (prevents deep_research from hitting the 900s sidecar instead of its 3600s per-sandbox adapter) - Add regression test for factory inner loop priority - Add unit tests for _is_opus_4_7_or_later model-id detection
…er_loop_mode - Per-sandbox A2A adapter now starts only when inner_loop_mode=a2a; native-mode sandboxes save 1 host port + adapter process resources - start-services.sh requires SANDBOX_ADAPTER_ENABLED=true and an explicit SANDBOX_ADAPTER_BACKEND (no more 'simulate' fallback in production) - Chat A2A sidecar hardened to adapter-only: entrypoint:[], read_only, minimal tmpfs — no Xvfb/VNC/code-server/MCP overhead - DockerSandbox.create() skips _a2a_adapter_env() entirely in native mode so backend auth tokens (GITHUB_TOKEN, ANTHROPIC_API_KEY, OPENAI_API_KEY) do not leak into native sandbox containers - DEFAULT_EXPOSED_PORTS is now the honest base set (6 ports); adapter port is added conditionally - Add 4 new unit tests: native gating, a2a port+env+token forwarding, port-count requirement difference, native-mode token-leak prevention
…alth Pool & lifecycle - Add SandboxPoolManager with 2 standby slots, claim/replenish/retire state machine, retire-on-age, dedupe, slot validation, and reap_stuck_initializing for crash recovery (pool.py + migration 20260422_000006_sandbox_prewarm_pool). - Wire reap_stuck_initializing into the orphan_cleanup loop unconditionally (pure DB UPDATE, must run even when host monitor is WARN/CRIT — ensure_full skips on WARN, so stuck rows accumulated indefinitely otherwise). - Fix QueuePool self-deadlock between caller's open TX and set_timeout's separate session: commit before set_timeout in service.init_sandbox; wrap docker._persist_deadline in asyncio.wait_for(timeout=10s); set idle_in_transaction_session_timeout=60000 in core/db/base. - R1/R2 hardening in orphan_cleanup: only mark sandbox DELETED after Docker container removal is confirmed; per-sandbox DB session so one failure doesn't roll back the batch. - R6 persistent timeout via AgentSandbox.timeout_at; cleanup loop enforces deadline as fallback. - Distributed Redis advisory lock (sandbox:cleanup:lock) for cleanup loop in multi-worker deployments. Host monitor & circuit breaker - New host_monitor.py: /proc/buddyinfo, pagetypeinfo, vmstat, meminfo parsers + percentile-baseline evaluator (BOOTSTRAP/OK/WATCH/WARN/ CRIT) backed by 48h ring buffer. - New breaker.py: Docker-call circuit breaker. - New executor.py: bounded thread pool for Docker SDK calls. - New /health/host and /health/sandbox-pool endpoints. Platform health & ops - scripts/stack_control.sh: add status --json, status --all, and modular platform_checks_*.sh (common/wsl/ubuntu/backend/pool). - /etc/sysctl.d drop-in (scripts/99-ii-agent.conf) for WSL2 host tuning. - Docker container hardening: read_only=True + tmpfs, cap_drop=ALL, no-new-privileges, mem_limit=3GB, pids_limit=512. Tests - 4 new POOL e2e tests + 2 HOST e2e tests in scripts/local/test_e2e.py. - Fix SBOX-06: replace removed AppConfig import with Settings, harden parser with regex against log interleaving. - New unit tests: sandbox_pool, sandbox_breaker, sandbox_create_ semaphore, host_monitor + integration, health_host_endpoint, health_sandbox_pool_endpoint, plus pool e2e suite. Docs - Design docs: sandbox-prewarm-pool, sandbox-pool-claim-self-deadlock, sandbox-shared-bridge-network, stack-control-platform-health, sandbox-lifecycle-assessment update, a2a-copilot-vision-support- briefing. - Runtime docs: docker-wsl2-recovery, host-resource-monitoring, wsl2-host-configuration, sandbox-networking-design, post-reboot-followups. - Impl tracker: sandbox-robustness-impl-tracker.
…t, and lazy MCP retry - Add `mcp_configured` flag to AgentSandbox model + migration (20260425_000007) - Sandbox service: MCP handoff waits for mcp_configured=True before releasing slot - Pool: expose /health/sandbox-pool endpoint with slot occupancy + MCP readiness - noVNC URL decoration for register_port tools (sandbox + dev variants) - New novnc.py helper for URL decoration logic - MCP factory: lazy retry wrapper (lazy_retry.py) for transient MCP connection failures - Docker shell framing fixes; docker_shell.py correctness improvements - orphan_cleanup: per-item DB isolation (R2), conditional state marking (R1) - Claude model: extended thinking + vision support improvements - A2A turn loop: Copilot backend fixes; fallback billing event - health.py: /health/ready endpoint; exception handler improvements - lifespan: startup validation for A2A chat strict mode - scripts: stack_control.sh enhancements; test_e2e.py expanded coverage; Windows port-forward script - docs: sandbox-pool-claim-mcp-handoff audit, postgres recovery mode runbook - tests: 10 new unit test files covering sandbox, noVNC, MCP handoff, health, storage, middleware - .gitignore: exclude .e2e_last_results.json; remove tracked copy
Skill copy_skill_to_sandbox previously ran mkdir/unzip/chown/chmod as user="root". /workspace is owned by user:user 755 (uid 1001), so root escalation was unnecessary AND harmful: root-owned files broke subsequent user-mode cleanup with Permission denied on retries. Code changes: - agents/skills/storage.py: drop user="root" from mkdir/unzip/chmod; remove the now-redundant chown -R; switch zip cleanup to rm -f for retry safety. Add inline notes documenting the ownership invariant. - settings/skills/storage.py: delete the duplicated copy_skill_to_sandbox + resolve_storage_uri + create_skill_zip_from_dir helpers; this module now owns only the GCS half of the pipeline. The agents/skills/storage copy is the canonical implementation. - agents/sandboxes/base.py: document the user= parameter contract (Docker honours via exec_run; E2B best-effort) and explicitly warn callers against using it as a security boundary. - agents/sandboxes/e2b.py: forward user= through to the E2B SDK when set. Docs: - docs/design-docs/sandbox-filesystem-design.md: new authoritative spec for /workspace ownership, write paths (put_archive cannot target /tmp on read_only=True containers per moby/moby#42333), and skill deployment. - AGENTS.md / CLAUDE.md: link the new spec; add the three governing rules (workspace-only uploads, never user=root under /workspace, root reserved for system commands). Unrelated drive-bys captured in the same commit: - docs/design-docs/session-lifecycle-and-data-custody.md: proposal v3.1 for review. - docs/runtime-docs/crossnote-pdf-export-tmpdir.md: WSL/Ubuntu snap Chromium ERR_FILE_NOT_FOUND fix for MPE PDF export.
Implements §4.1 (three-phase purge driver) and §4.6 (storage reaper) from
docs/design-docs/session-lifecycle-and-data-custody.md, behind dual feature
flags SESSIONS_PURGE_ENABLED and SESSIONS_STORAGE_REAPER_ENABLED. Both
default to false; the cleanup-loop stage is wired but dormant in production.
Schema (migration 20260427_000008):
- sessions.purge_after / purge_attempts / purge_started_at + partial idx
- users.is_purging gate
- purge_dead_letter table for operator-facing leaked-resource records
Runtime (src/ii_agent/sessions/purge/):
- claim/pii_strip/commit phases, single arbitration point in session_purge
- provider hook registry (empty until real DELETEs are wired)
- storage_reaper for orphan UserAsset blobs
- orm_guards exporting register_purge_guards (not yet called from lifespan)
Cleanup-loop integration:
- cleanup_loop_stage_purge_sessions + cleanup_loop_stage_storage_reaper
slot between _pause_stale_sandboxes and _cleanup_docker_zombies
Tests (src/tests/unit/sessions/purge/):
- 22 contract tests passing; 32 PR-E/F/G behavioural tests skipped pending
bodies (mypy --strict + ruff clean across the package)
Doc:
- §0.0 rollout gate added: review-request matrix, 10-item pre-flip
checklist, sign-off block. Flag MUST NOT flip without core-team approval
- §0 PR-E row notes register_purge_guards exported but not yet wired
- §5 step 6 cross-references §0.0 as the irreversible boundary
Bug fixes vs initial draft:
B1+B2 commit.py — vanished-row case returns ALREADY_PURGED (I19);
single canonical _AUDIT_EVENT_TYPE='session.purge_committed'
B3 ExhaustedRetriesError carries dead_letter_count; session_purge
propagates it into PurgeResult
B4 sessions/__init__.py imports purge.db_models so PurgeDeadLetter
registers with Base.metadata at startup
Implements GDPR Art. 17 SAR purge + grace-window cleanup with flag-gated three-phase commit (strip → orphan-purge → session DELETE). Mutation gating (I3/I8 §16): - NotPurgingDep applied to 12 mutating endpoints across sessions/, pin/, and wishlist/ routers — closes the PATCH/fork/legacy-restore hole that could race the purge driver. Invariant runner: - New check_runner + scripts/local/check_purge_invariants.py exercising 19 DB-checkable invariants (I1–I19); structural-only checks marked SKIP rather than failing. - Runner now rolls back AsyncSession after per-invariant exceptions so one bad query no longer cascades into 7+ ERRORs. - I11 rewritten from a content-key denylist to the real strip discriminator (user_id NULL + orphaned session_id + non-allowlist content key). Eliminates ~1,236 false positives. Audit trail: - PURGE_COMMITTED_EVENT_TYPE constant centralised in purge/types.py; consumed by commit.py and session_purge.py. - application_events.session_id intentionally retains no FK so audit rows survive session DELETE as forensic breadcrumbs (migration 20260428_000010). Other: - Storage reaper, OpenAI provider hooks, ORM guards, canary e2e test, PITR-restore runbook, and implementation tracker. - Design doc drift fixes (§14.4, §16) in session-lifecycle-and-data-custody. Verification: mypy --strict clean across changed files; runner reports 11 PASS / 0 FAIL / 0 ERROR / 8 structural-skip; 24/24 unit tests pass.
Session-purge hardening — pre-flip blockers clearedPushed as What landed
Verification
Still open (tracked, not blockers for flag-default-off)
|
Promote runtime purge invariants from prose into a self-validating contract with mechanical artefact checks. Schema (migration 20260429_000011): - CHECK constraints for I1 / I1b (state machine) - partial UNIQUE index for I19 (provider/resource_id) - BEFORE DELETE trigger on users (I14) - discriminator columns: users.is_purging_set_at, application_events.stripped_at - partial covering indexes for invariant probes Code: - invariants.py: rewrite into three disjoint tiers — SCHEMA_ENFORCED (4), DB_CHECKABLE (9), STRUCTURAL_TEST_ENFORCED (6); fixes I2 docstring and I17 catalogue entry - reconcile_providers.py: I9 OpenAI Files audit job (correct column names, idempotent insert via WHERE NOT EXISTS, scoped to provider 'openai') - check_runner.py: assert_cleanup_uses_primary_db sentinel for I17 - lifespan.py: I17 startup gate at step 4a-bis - pii_strip.py / user_purge.py: write discriminator timestamps - workers/cron/tasks.py: daily run_purge_invariants_check job Tests: - test_purge_structural_invariants.py: Tier 3 pinning + strong-form parity test that resolves cited test artefacts - test_reconcile_providers.py: 5 unit tests pinning the audit-job fixes - test_doc_stub_parity.py: validates tier union Docs: - session-lifecycle-and-data-custody.md §2.3: rewritten with Tier column, in sync with invariants.py catalogue 368 sessions+realtime+app unit tests pass; lint clean; migration auto-applied on backend startup verified locally.
APScheduler uses CLOCK_MONOTONIC for wake-ups. On hypervisor guests (WSL2, KVM laptops, etc.) the host can suspend the VM's vCPU, freezing that clock; when it thaws, every fire scheduled during the gap is reported 'missed by N min' and silently dropped (default grace = 1s). This was causing the new daily lifecycle-invariants probe to risk skipping a full day per missed window, and was already dropping the 40-min cleanup jobs in development. Detect host class (env override, /proc/version for WSL2, hypervisor flag in /proc/cpuinfo) and apply tighter grace on bare metal (60s) or generous grace on VMs (1h default; 6h for the 24h invariants probe), with coalesce=True everywhere so a backlog collapses to one catch-up run. Detection result is logged at scheduler start. Verified end-to-end on this WSL2 host: backend rebuilt, scheduler launched 3 jobs as host_class=vm, both 40-min cleanup jobs fired and completed at 04:05 UTC with zero misfire warnings. 13 unit tests cover detection edge cases and per-job grace assignment.
Follow-up commit:
|
1 similar comment
Follow-up commit:
|
Local multi-user dev login
- core/config/settings.py: new DevUserConfig + Settings.dev_users
(JSON env DEV_USERS); validates username charset and PIN length.
- auth/router.py: GET /auth/dev/users chooser endpoint;
POST /auth/dev/login now takes {username, pin}, looks up the named
user, validates PIN with constant-time compare, per-username rate
limit + sleep-throttle on failures, generic error message.
Each named user maps to dev+<username>@localhost for full
session/credit isolation between household members.
- frontend/login.tsx: chooser dropdown + PIN input replaces the single
"Dev login" button; only shown when /auth/dev/users reports enabled.
- frontend/utils.ts: getFirstCharacters resilient to punctuation /
unicode / empty tokens so avatar initials don't break for dev names.
- docker/.stack.env.local.example: DEV_USERS placeholder + docs.
Agent runtime resilience
- agents/models/anthropic/claude.py: stop synthesizing
redacted_thinking from plaintext reasoning_content — Anthropic
rejects non-issued blobs with a non-retriable 400 that bricks
replay. Drop the block with a warning; text/tool_use survive.
(Triage: session 9785de09, 2026-05-11.)
- agents/inner_loop.py: detect empty A2A turn (no text, reasoning,
tool call, or session.error) and raise ModelProviderError instead
of silently completing. Surfaces quota-exhausted Copilot CLI
failures to the user / fallback path.
- Tests pinned for both fixes.
Housekeeping
- scripts/stack_control.sh: suppress spurious "[timed out]" annotation
on AVAILABLE pool slots whose lifetime is governed by retire_at.
- .github/copilot-instructions.md: drop stale --local hint; document
`stack_control.sh verify`.
Lint clean on changed Python files.
…imers
Long deep_research turns were tripping the single fixed per-turn
wall-clock cap and falling back to the native (billed) Anthropic
provider mid-task even though the Copilot backend was still
productively streaming events.
Splits the watchdog in copilot_backend.py into two independent timers:
- absolute: hard wall-clock safety net (defaults 300s -> 1800s; long
horizon 3600s -> 7200s)
- activity: max idle time with no SDK events; resets on every
non-heartbeat event (defaults 600s; long horizon 900s)
Surface area:
- core/config/agent.py: new a2a_adapter_activity_timeout_long_horizon
setting; tightened docstring on the absolute long-horizon setting
- integrations/a2a/adapter_server.py: read A2A_*_ACTIVITY_TIMEOUT env
vars and pass through to CopilotConfig
- agents/sandboxes/docker.py: forward A2A_*_ACTIVITY_TIMEOUT vars to
sandbox containers, honouring the long-horizon override for
research-class agents
- docker/docker-compose.local.yaml: expose the new env vars on the
a2a-adapter sidecar with matching defaults
- tests: cover the new env wiring on both the docker sandbox and
copilot backend layers
Also in this commit:
- e2b.Dockerfile: bump GH_CLI_VERSION 2.91.0 -> 2.92.0 (2.91.0 was
rolled out of the apt repo, breaking sandbox rebuilds)
- .gitignore: ignore build-manifest-*.json (generated per-build by
scripts/stack_control.sh and COPY'd into each image; was being
flagged as untracked after every build)
Follow-up commit:
|
| Timer | Purpose | Default | Long-horizon |
|---|---|---|---|
| absolute | hard wall-clock safety net | 300s → 1800s | 3600s → 7200s |
| activity | max idle with no SDK events; resets on every non-heartbeat event | — → 600s | — → 900s |
The activity timer is the real is the backend stuck? signal; the absolute timer is now just a forgiving safety net so productive long turns never get killed.
Surface area
core/config/agent.py— newa2a_adapter_activity_timeout_long_horizonsetting; tightened docstring on the absolute long-horizon setting.integrations/a2a/adapter_server.py— readA2A_*_ACTIVITY_TIMEOUTenv vars and pass through toCopilotConfig.agents/sandboxes/docker.py— forwardA2A_*_ACTIVITY_TIMEOUTvars into sandbox containers, honouring the long-horizon override for research-class agents.docker/docker-compose.local.yaml— expose the new env vars on thea2a-adaptersidecar with matching defaults.- Tests cover the new env wiring on both the Docker sandbox and Copilot backend layers.
Native loop check: confirmed the native inner loop has no analogous per-turn watchdog — timeouts there are per-HTTP-request only (Anthropic 300s, Google 600s) and there is no turn-level wrapper that could prematurely abort a productive turn. No native-side timing changes required.
Also in this commit
e2b.Dockerfile: bumpGH_CLI_VERSION2.91.0 → 2.92.0. 2.91.0 was rolled out of the apt repo, breaking sandbox rebuilds..gitignore: ignorebuild-manifest-*.json(generated per-build byscripts/stack_control.shand COPY'd into each backend/frontend/sandbox image; was showing as untracked after every build).
Chat A2A Inner Loop, Council Routing & Compaction Authority (3/3)
This final slice lands the chat-mode A2A path and the follow-up hardening needed to validate model steering, stabilize Copilot session handling, and polish the composer UX.
Core chat A2A delivery
Chat A2A image retention
extract_historical_image_parts()to rehydrate prior-turn images into current A2A requestschat-a2a-image-rehydrate-design.md(superseded by simpler as-built approach)Model steering and runtime validation
Sandbox lifecycle hardening
timeout_atcolumn (R6)sandbox_timeout_atand FK constraintssandbox-lifecycle-assessment.md,sandbox-accumulation-root-cause-analysis.mdCopilot and sandbox hardening
Storage proxy fix
Content-Lengthheader toproxy_download()— previously usedtransfer-encoding: chunkedwhich broke PDF/media rendering in some clientsFrontend and input UX
Test coverage and docs
Verified test totals
Diff stats
Update 2026-04-19 (commit
52f2682)Follow-on polish on top of the above:
AgentSettings.a2a_adapter_timeout_long_horizon(default 3600s) anda2a_adapter_long_horizon_agent_kinds(default{deep_research}) override the standard adapter timeout for long-running agent kinds.agent_kindis now plumbedIIAgent → sandbox metadata → DockerSandbox._a2a_adapter_envwithAgentTypeenum validation via a new_agent_kind_from_namehelper so unknown/tool-owned agent names never trigger the override.thinking={type:enabled,budget_tokens:...}blocks on Opus 4.7 with HTTP 400. We now detect Opus 4.7+ via_is_opus_4_7_or_laterand drop the manual thinking block, letting the model manage thinking adaptively.inner_loop.pythe_lock.acquire()+yield CompactionAuthorityEvent(...)pair moved inside thetryblock. Previously a consumeraclose()could raiseGeneratorExitbetweenacquireandyieldand bypass the release path.verifysubcommand + sha256 manifest for build artefact attestation.architecture.mdanddependencies.md._agent_kind_from_name;TestA2AAdapterEnvextended with long-horizon override cases; adapter/orphan/R4 test tweaks.Diff vs prior PR tip: 15 files changed, +770 / −287.
Update 2026-04-24 (commit
8a360bb)Sandbox prewarm pool, host monitoring, and platform-health hardening on top of the lifecycle work above.
Pool & lifecycle
SandboxPoolManager(newagents/sandboxes/pool.py+ migration20260422_000006_sandbox_prewarm_pool.py): 2 standby slots with claim / replenish / retire state machine, retire-on-age, dedupe, slot validation, andreap_stuck_initializing()for crash recovery.reap_stuck_initializingwas only invoked frombootstrap()andensure_full(), andensure_full()short-circuits on host_state ≥ WARN. So under any sustained host pressure stuckINITIALIZINGrows accumulated indefinitely. Now wired unconditionally intoorphan_cleanup(pure DB UPDATE — safe regardless of host state). POOL-04 reap latency dropped from 180s timeout → ~34s.service.init_sandboxcommits the caller's TX before callingset_timeout(which opens its own session);docker._persist_deadlinewrapped inasyncio.wait_for(timeout=10s);idle_in_transaction_session_timeout=60000set incore/db/base.DELETEDafter Docker container removal is confirmed; per-sandbox DB session so one failure doesn't roll back the batch; persistenttimeout_atenforced as fallback by the cleanup loop.sandbox:cleanup:lock(5-min TTL,SET NX EX) so only one backend instance runs cleanup at a time in multi-worker deployments.Host monitor & circuit breaker
host_monitor.py: parsers for/proc/buddyinfo,pagetypeinfo,vmstat,meminfo+ percentile-baseline evaluator (BOOTSTRAP / OK / WATCH / WARN / CRIT) backed by a 48h ring buffer.breaker.py: circuit breaker around Docker SDK calls.executor.py: bounded thread pool for Docker SDK calls (prevents thread-pool exhaustion under load)./health/hostand/health/sandbox-pool.Platform health & ops
scripts/stack_control.sh:status --json,status --all, plus modularscripts/local/lib/platform_checks_*.sh(common / wsl / ubuntu / backend / pool).scripts/99-ii-agent.conf:/etc/sysctl.ddrop-in for WSL2 host tuning.read_only=True+ tmpfs (/tmp,/var/tmp,/run,/home/user),cap_drop=ALL, selectivecap_add,no-new-privileges,mem_limit=3GB,pids_limit=512. Docker socket auto-detection probes/var/run/docker.sock, Colima, OrbStack, Podman.Tests
scripts/local/test_e2e.py.AppConfigimport withSettings; hardened parser with regex against log interleaving.test_sandbox_pool,test_sandbox_breaker,test_sandbox_create_semaphore,test_host_monitor(+ integration),test_health_host_endpoint,test_health_sandbox_pool_endpoint, plus a pool e2e suite undersrc/tests/e2e/.Docs
sandbox-prewarm-pool.md,sandbox-pool-claim-self-deadlock.md,sandbox-shared-bridge-network.md,stack-control-platform-health.md,a2a-copilot-vision-support-briefing.md; refresh ofsandbox-lifecycle-assessment.md.docker-wsl2-recovery.md,host-resource-monitoring.md,wsl2-host-configuration.md,sandbox-networking-design.md,post-reboot-followups.md.sandbox-robustness-impl-tracker.md.Diff vs prior PR tip: 73 files changed, +12337 / −390.
Update 2026-04-25 (commit
590988f)Sandbox file-ownership correctness fix and authoritative spec.
Skill deployment under
/workspaceno longer escalates to rootagents/skills/storage.py::copy_skill_to_sandbox: droppeduser="root"frommkdir/unzip/chmod; removed the now-redundantchown -R user:user; switched zip cleanup torm -ffor retry safety./workspacewere breaking subsequent user-mode cleanup withPermission deniedon retry./workspaceis owned byuser:user 755(uid 1001) so the root escalation was never necessary.copy_skill_to_sandbox/resolve_storage_uri/create_skill_zip_from_dirfromsettings/skills/storage.py; that module now owns only the GCS half. Theagents/skills/storage.pycopy is canonical.Sandbox base API contract
agents/sandboxes/base.py: documented theuser=parameter contract — Docker honours it viaexec_run, E2B forwards best-effort, and it is not a security boundary.agents/sandboxes/e2b.py: forwarduser=through to the E2B SDK when set.Authoritative filesystem spec
docs/design-docs/sandbox-filesystem-design.md— the spec for/workspaceownership, write-path rules (put_archivecannot target/tmponread_only=Truecontainers per rootless+overlay2 (kernel 5.11)+SELinux: mkdir /home/<USER>/.local/share/docker/overlay2/<CID>-init/merged/dev: permission denied. moby/moby#42333), and skill deployment ownership invariants.AGENTS.md/CLAUDE.md: link the new spec; encode the three governing rules (workspace-only host-mediated uploads, neveruser="root"under/workspace, root reserved for system-level commands).Drive-bys (unrelated, low-risk docs)
docs/design-docs/session-lifecycle-and-data-custody.md— proposal v3.1, ready for core-design review.docs/runtime-docs/crossnote-pdf-export-tmpdir.md— runtime note for the WSL/Ubuntu snap-ChromiumERR_FILE_NOT_FOUNDissue with MPE PDF export.Diff vs prior PR tip: 9 files changed, +1236 / −300.
Update 2026-04-27 (commit
94fb301)Session-lifecycle purge subsystem — design doc + flag-gated implementation. Driver SHIPS DARK; do not flip the kill switch without core-team sign-off (see §0.0 of the design doc).
Why this is in the PR
The local stack revealed 1,970 of 2,033
sessionsrows soft-deleted (97 %), oldest from 2026-04-13 —sessions.is_deletedandsessions.delete_afterwere already on the schema but no purger existed anywhere in the codebase. The closest precedent (_purge_stale_deleted_rows) only sweptagent_sandboxes. This commit lands the deferred purger and the design contract that goes with it.Schema delta —
migrations/versions/20260427_000008_session_purge_v34.pysessions.purge_after / purge_attempts / purge_started_at+ two partial indexes (is_deleted=truecandidate queue,purge_started_at IS NOT NULLclaim watchdog).users.is_purginggate column for the user-account purge path (PR-G).purge_dead_letter— operator-facing leaked-resource records (provider, resource_kind, resource_id, error_message, resolved_at/by/note). Indexed on(created_at) WHERE resolved_at IS NULL.Runtime —
src/ii_agent/sessions/purge/(15 modules, ~2 200 LOC, mypy--strictclean)claim.py→pii_strip.py+commit.pyglued bysession_purge.pyas the single arbitration entry point.FOR UPDATErow-lock spans claim through commit; phase (b) is lock-free across I/O.providers.pyexposesregister_cleanup_hook. Registry is empty. Phase (b) is a no-op until concrete provider DELETEs (E2B sandboxes, OpenAI vector stores, GCS slide assets, Composio profiles, Stripe customers) are wired.storage_reaper.pyreaps orphanUserAssetblobs (noSessionAssetlink, not public, older thanSESSIONS_STORAGE_REAPER_MIN_AGE_SECONDS).orm_guards.py::register_purge_guards()defines abefore_insertlistener that enforces I3 (is_purginggate) at the ORM layer. Exported but not yet wired intoapp/lifespan.py— listed in §0.0 pre-flip checklist.cleanup_loop_stage_purge_sessions()andcleanup_loop_stage_storage_reaper()slot between_pause_stale_sandboxesand_cleanup_docker_zombiesinagents/sandboxes/orphan_cleanup.py.core/config/sessions.py::SessionsSettings(env prefixSESSIONS_). Defaults:purge_enabled=False✋ ships darkstorage_reaper_enabled=False✋ ships darkprovider_cleanup_enabled=Truepurge_grace_period_seconds=2_592_000(30 d),ephemeral_purge_grace_period_seconds=3_600purge_max_seconds_per_loop=30,purge_max_attempts=5purge_claim_timeout_seconds=600,heartbeat_interval_seconds=120Tests —
src/tests/unit/sessions/purge/(22 passed, 32 skipped)test_purge_contracts.py— 22 contract tests passing today (types, exceptions, invariant identity, SARRequest validators). 32 PR-E/F/G behavioural skips (claim arbitration, dead-letter retention, ALREADY_PURGED idempotency, phase-(c) re-check, SAR intake, restore-during-SAR, etc.) tracked for follow-up.test_doc_stub_parity.py— every public symbol inpurge/__init__.py::__all__is referenced by name in the design doc; doc-named symbols exist in the package.Bug fixes vs initial draft
PurgeOutcome.ALREADY_PURGED(I19), notSKIPPED_RESTORED. Specific-id invocations precheckapplication_eventsfor the canonical event typecommit.py,session_purge.py_AUDIT_EVENT_TYPE = "session.purge_committed"— replaces a legacy mapping that emittedsession.purged_by_user/session.purged_by_graceand broke the documented contractcommit.pyExhaustedRetriesError(message, *, dead_letter_count=0)carries the count;providers.pypopulates it;session_purge.pypropagates it intoPurgeResultso logs/metrics reflect realityexceptions.py,providers.py,session_purge.pysessions/__init__.pynow importspurge.db_modelssoPurgeDeadLetterregisters withBase.metadataat import time — was missing, would have made the table invisible to autogeneratesessions/__init__.pyDesign doc —
docs/design-docs/session-lifecycle-and-data-custody.md(v3.11)register_purge_guardswired, PR-E behavioural tests unblocked, canary cycle, PITR drill, observability, backup retention), reversibility envelope, three-signature sign-off block per environment with named rollback owner.Verified runtime evidence (live local DB after rebuild)
alembic_version = 20260427_000008✅purge_dead_lettertable present ✅cleanup_loop_stage_purge_sessions()reachable from the orphan-cleanup loop ✅purge_enabled=False; noSESSIONS_PURGE_ENABLEDindocker/.stack.env.local✅application_events.event_type='session.purge_committed'count = 0;purge_dead_lettercount = 0 ✅ (driver dormant by design)Quality gates
mypy --strict src/ii_agent/sessions/purge/ src/ii_agent/core/config/sessions.py→ Success: no issues found in 15 source filesruff check+ruff format --checkon all touched paths: cleanpytest src/tests/unit/sessions/purge/ -q→ 22 passed, 32 skipped (PR-E/F/G behavioural)What still needs to happen before the flag flips
Recorded in the §0.0 pre-flip checklist; key items:
purge/packageregister_cleanup_hookso phase (b) is not a permanent no-opregister_purge_guards()wired intoapp/lifespan.pyDiff vs prior PR tip: 26 files changed, +868 / −182 (modified) + 19 new files.
Latest commit —
9ba1240sessions/purge: harden invariant subsystem with three-tier enforcementPromotes the runtime purge invariants from prose into a self-validating contract.
Schema (migration
20260429_000011_invariant_hardening):provider,resource_id)users(I14)users.is_purging_set_at,application_events.stripped_atCode:
invariants.py: rewritten into three disjoint tiers —SCHEMA_ENFORCED(4) /DB_CHECKABLE(9) /STRUCTURAL_TEST_ENFORCED(6); fixes I2 docstring and I17 catalogue entryreconcile_providers.py(new): I9 OpenAI Files audit job — correct column names, idempotent insert viaWHERE NOT EXISTS, scoped toprovider = 'openai'check_runner.py:assert_cleanup_uses_primary_dbsentinel for I17 (replica-engine guard)app/lifespan.py: I17 startup gate wired in at step 4a-bispii_strip.py/user_purge.py: write discriminator timestampsworkers/cron/tasks.py: dailyrun_purge_invariants_checkAPScheduler jobTests:
test_purge_structural_invariants.py(new): Tier 3 pinning + strong-form parity test that resolves cited test artefacts (catches catalogue drift mechanically)test_reconcile_providers.py(new): 5 unit tests pinning the audit-job correctness fixestest_doc_stub_parity.py: extended to validate the tier unionDocs:
session-lifecycle-and-data-custody.md§2.3: rewritten withTiercolumn, kept in sync withinvariants.pycatalogueConvergence: Three review passes complete (4 → 3 → 0 defects, strictly-decreasing severity: runtime → catalogue-drift → none). 368 sessions+realtime+app unit tests pass; lint clean; migration auto-applied on backend startup verified locally against the running stack.
Latest commit —
fa26339local-dev usability + two silent-failure agent fixesLocal multi-user dev login
core/config/settings.py: newDevUserConfig+Settings.dev_users(JSON envDEV_USERS); validates username charset and PIN length.auth/router.py:GET /auth/dev/userschooser endpoint;POST /auth/dev/loginnow takes{username, pin}, looks up the named user, validates PIN with constant-time compare, per-username rate limit + sleep-throttle on failures, generic error either way. Each named user maps todev+<username>@localhostfor full session/credit isolation between household members.frontend/login.tsx: dropdown chooser + PIN input replaces the single "Dev login" button; only rendered when/auth/dev/usersreportsenabled: true.frontend/utils.ts:getFirstCharactersmade resilient to punctuation / unicode / empty tokens so avatar initials don't break for dev display names.docker/.stack.env.local.example:DEV_USERSplaceholder + documentation.Agent runtime resilience
agents/models/anthropic/claude.py: stop synthesisingredacted_thinkingfrom plaintextreasoning_content— Anthropic validatesredacted_thinking.dataas an opaque ciphertext they issued, so plaintext triggers a non-retriable 400 (Invalid data in redacted_thinking block) that permanently bricks replay of the conversation. The block is now dropped with aWARNING; regular text/tool_use content of the assistant message survives. (Triage: session9785de09, 2026-05-11.)agents/inner_loop.py: detect empty A2A turn (ASSISTANT_TURN_START→ASSISTANT_TURN_ENDwith no content, reasoning, tool call, orsession.error— e.g. Copilot CLI when quota-exhausted) and raiseModelProviderErrorinstead of completing the run with an empty response. Outer fallback path can now retry on native or surface the failure to run status.test_inner_loop.py,test_v1_models_anthropic_claude.py).Housekeeping
scripts/stack_control.sh: suppress spurious[timed out]annotation onAVAILABLEpool slots whose lifetime is governed byretire_at, nottimeout_at(the R6 reaper explicitly excludes them)..github/copilot-instructions.md: drop stale--localhint; documentstack_control.sh verifyfor "which containers need rebuild" introspection.Lint clean (
ruff check+ruff format --check) on all changed Python files.