feat: chat A2A inner loop, council routing, compaction authority (3/3) by mdear · Pull Request #200 · Intelligent-Internet/ii-agent

mdear · 2026-04-13T22:13:34Z

Chat A2A Inner Loop, Council Routing & Compaction Authority (3/3)

Merge order: This PR targets main but is intended to be merged after feature/a2a-agent-inner-loop_2_of_3 (#199), which itself follows feature/local-docker-sandbox_1_of_3 (#198).

This final slice lands the chat-mode A2A path and the follow-up hardening needed to validate model steering, stabilize Copilot session handling, and polish the composer UX.

Core chat A2A delivery

Adds the chat A2A turn loop and event translation needed for streamed chat-mode execution
Routes council members through A2A independently while preserving usage visibility
Prevents native/A2A cross-authority summary chaining during compaction

Chat A2A image retention

Adds extract_historical_image_parts() to rehydrate prior-turn images into current A2A requests
Integrates into adapter_server event source for multi-turn image continuity
Design doc: chat-a2a-image-rehydrate-design.md (superseded by simpler as-built approach)

Model steering and runtime validation

Verifies both Agent Settings and Chat Settings model selection paths end-to-end
Adds runtime diagnostics/helpers to confirm the selected model reaches the Copilot backend
Refreshes the E2E plan and related implementation notes for the steering workflow

Sandbox lifecycle hardening

6-phase orphan cleanup pipeline: soft-delete expired sessions, orphan kill, stale pause, zombie removal, volume cleanup, timeout enforcement
Per-sandbox DB isolation (R2), conditional state marking (R1), persistent timeout_at column (R6)
Alembic migration for sandbox_timeout_at and FK constraints
Design docs: sandbox-lifecycle-assessment.md, sandbox-accumulation-root-cause-analysis.md

Copilot and sandbox hardening

Drains trailing post-turn session errors instead of silently swallowing them
Improves fallback behavior for connection, timeout, and rate-limit failures
Tightens sandbox port discovery/configuration and local stack control flows

Storage proxy fix

Adds Content-Length header to proxy_download() — previously used transfer-encoding: chunked which broke PDF/media rendering in some clients

Frontend and input UX

Keeps the multiline composer tail visible while typing or pasting on mobile
Tightens settings state typing and related model-selection handling

Test coverage and docs

83 multimodal unit tests, 54 adapter server tests, 60 turn loop tests
Expands targeted unit and E2E coverage for the A2A turn loop, council, Copilot, and steering paths
Updates the supporting review, design, implementation, and test documentation

Verified test totals

Unit: 5758 passed, 0 failed, 0 errors, 0 skipped
E2E: 39 passed, 0 failed, 0 errors, 0 skipped

Diff stats

92 files changed, +6186 / −758 lines
14 new files (tests, migrations, design docs)

Update 2026-04-19 (commit `52f2682`)

Follow-on polish on top of the above:

Long-horizon adapter timeouts: AgentSettings.a2a_adapter_timeout_long_horizon (default 3600s) and a2a_adapter_long_horizon_agent_kinds (default {deep_research}) override the standard adapter timeout for long-running agent kinds. agent_kind is now plumbed IIAgent → sandbox metadata → DockerSandbox._a2a_adapter_env with AgentType enum validation via a new _agent_kind_from_name helper so unknown/tool-owned agent names never trigger the override.
Opus 4.7 adaptive thinking: Anthropic rejects manual thinking={type:enabled,budget_tokens:...} blocks on Opus 4.7 with HTTP 400. We now detect Opus 4.7+ via _is_opus_4_7_or_later and drop the manual thinking block, letting the model manage thinking adaptively.
Compaction lock leak fix: In inner_loop.py the _lock.acquire() + yield CompactionAuthorityEvent(...) pair moved inside the try block. Previously a consumer aclose() could raise GeneratorExit between acquire and yield and bypass the release path.
stack_control.sh: new verify subcommand + sha256 manifest for build artefact attestation.
CODEMAPS: refreshed architecture.md and dependencies.md.
Tests: 24 parametrized tests for _agent_kind_from_name; TestA2AAdapterEnv extended with long-horizon override cases; adapter/orphan/R4 test tweaks.

Diff vs prior PR tip: 15 files changed, +770 / −287.

Update 2026-04-24 (commit `8a360bb`)

Sandbox prewarm pool, host monitoring, and platform-health hardening on top of the lifecycle work above.

Pool & lifecycle

SandboxPoolManager (new agents/sandboxes/pool.py + migration 20260422_000006_sandbox_prewarm_pool.py): 2 standby slots with claim / replenish / retire state machine, retire-on-age, dedupe, slot validation, and reap_stuck_initializing() for crash recovery.
Cleanup-loop reap fix: reap_stuck_initializing was only invoked from bootstrap() and ensure_full(), and ensure_full() short-circuits on host_state ≥ WARN. So under any sustained host pressure stuck INITIALIZING rows accumulated indefinitely. Now wired unconditionally into orphan_cleanup (pure DB UPDATE — safe regardless of host state). POOL-04 reap latency dropped from 180s timeout → ~34s.
QueuePool self-deadlock fix: service.init_sandbox commits the caller's TX before calling set_timeout (which opens its own session); docker._persist_deadline wrapped in asyncio.wait_for(timeout=10s); idle_in_transaction_session_timeout=60000 set in core/db/base.
R1/R2/R6 hardening continued: only mark sandbox DELETED after Docker container removal is confirmed; per-sandbox DB session so one failure doesn't roll back the batch; persistent timeout_at enforced as fallback by the cleanup loop.
Distributed cleanup lock: Redis advisory lock sandbox:cleanup:lock (5-min TTL, SET NX EX) so only one backend instance runs cleanup at a time in multi-worker deployments.

Host monitor & circuit breaker

host_monitor.py: parsers for /proc/buddyinfo, pagetypeinfo, vmstat, meminfo + percentile-baseline evaluator (BOOTSTRAP / OK / WATCH / WARN / CRIT) backed by a 48h ring buffer.
breaker.py: circuit breaker around Docker SDK calls.
executor.py: bounded thread pool for Docker SDK calls (prevents thread-pool exhaustion under load).
New endpoints: /health/host and /health/sandbox-pool.

Platform health & ops

scripts/stack_control.sh: status --json, status --all, plus modular scripts/local/lib/platform_checks_*.sh (common / wsl / ubuntu / backend / pool).
scripts/99-ii-agent.conf: /etc/sysctl.d drop-in for WSL2 host tuning.
Container hardening: read_only=True + tmpfs (/tmp, /var/tmp, /run, /home/user), cap_drop=ALL, selective cap_add, no-new-privileges, mem_limit=3GB, pids_limit=512. Docker socket auto-detection probes /var/run/docker.sock, Colima, OrbStack, Podman.

Tests

4 new POOL e2e + 2 HOST e2e tests in scripts/local/test_e2e.py.
SBOX-06 fix: replaced removed AppConfig import with Settings; hardened parser with regex against log interleaving.
New unit suites: test_sandbox_pool, test_sandbox_breaker, test_sandbox_create_semaphore, test_host_monitor (+ integration), test_health_host_endpoint, test_health_sandbox_pool_endpoint, plus a pool e2e suite under src/tests/e2e/.
E2E result: 46/46 PASS in 921.7s (zero FAIL/ERROR/SKIP) on a clean sweep after the fixes landed.

Docs

Design: sandbox-prewarm-pool.md, sandbox-pool-claim-self-deadlock.md, sandbox-shared-bridge-network.md, stack-control-platform-health.md, a2a-copilot-vision-support-briefing.md; refresh of sandbox-lifecycle-assessment.md.
Runtime: docker-wsl2-recovery.md, host-resource-monitoring.md, wsl2-host-configuration.md, sandbox-networking-design.md, post-reboot-followups.md.
Impl tracker: sandbox-robustness-impl-tracker.md.

Diff vs prior PR tip: 73 files changed, +12337 / −390.

Update 2026-04-25 (commit `590988f`)

Sandbox file-ownership correctness fix and authoritative spec.

Skill deployment under `/workspace` no longer escalates to root

agents/skills/storage.py::copy_skill_to_sandbox: dropped user="root" from mkdir/unzip/chmod; removed the now-redundant chown -R user:user; switched zip cleanup to rm -f for retry safety.
Root-owned files under /workspace were breaking subsequent user-mode cleanup with Permission denied on retry. /workspace is owned by user:user 755 (uid 1001) so the root escalation was never necessary.
Deduplicated: removed the stale copy of copy_skill_to_sandbox / resolve_storage_uri / create_skill_zip_from_dir from settings/skills/storage.py; that module now owns only the GCS half. The agents/skills/storage.py copy is canonical.

Sandbox base API contract

agents/sandboxes/base.py: documented the user= parameter contract — Docker honours it via exec_run, E2B forwards best-effort, and it is not a security boundary.
agents/sandboxes/e2b.py: forward user= through to the E2B SDK when set.

Authoritative filesystem spec

New docs/design-docs/sandbox-filesystem-design.md — the spec for /workspace ownership, write-path rules (put_archive cannot target /tmp on read_only=True containers per rootless+overlay2 (kernel 5.11)+SELinux: mkdir /home/<USER>/.local/share/docker/overlay2/<CID>-init/merged/dev: permission denied. moby/moby#42333), and skill deployment ownership invariants.
AGENTS.md / CLAUDE.md: link the new spec; encode the three governing rules (workspace-only host-mediated uploads, never user="root" under /workspace, root reserved for system-level commands).

Drive-bys (unrelated, low-risk docs)

docs/design-docs/session-lifecycle-and-data-custody.md — proposal v3.1, ready for core-design review.
docs/runtime-docs/crossnote-pdf-export-tmpdir.md — runtime note for the WSL/Ubuntu snap-Chromium ERR_FILE_NOT_FOUND issue with MPE PDF export.

Diff vs prior PR tip: 9 files changed, +1236 / −300.

Update 2026-04-27 (commit `94fb301`)

Session-lifecycle purge subsystem — design doc + flag-gated implementation. Driver SHIPS DARK; do not flip the kill switch without core-team sign-off (see §0.0 of the design doc).

Why this is in the PR

The local stack revealed 1,970 of 2,033 sessions rows soft-deleted (97 %), oldest from 2026-04-13 — sessions.is_deleted and sessions.delete_after were already on the schema but no purger existed anywhere in the codebase. The closest precedent (_purge_stale_deleted_rows) only swept agent_sandboxes. This commit lands the deferred purger and the design contract that goes with it.

Schema delta — `migrations/versions/20260427_000008_session_purge_v34.py`

sessions.purge_after / purge_attempts / purge_started_at + two partial indexes (is_deleted=true candidate queue, purge_started_at IS NOT NULL claim watchdog).
users.is_purging gate column for the user-account purge path (PR-G).
New table purge_dead_letter — operator-facing leaked-resource records (provider, resource_kind, resource_id, error_message, resolved_at/by/note). Indexed on (created_at) WHERE resolved_at IS NULL.

Runtime — `src/ii_agent/sessions/purge/` (15 modules, ~2 200 LOC, mypy `--strict` clean)

Three-phase driver — claim.py → pii_strip.py + commit.py glued by session_purge.py as the single arbitration entry point. FOR UPDATE row-lock spans claim through commit; phase (b) is lock-free across I/O.
Provider-cleanup hook registry — providers.py exposes register_cleanup_hook. Registry is empty. Phase (b) is a no-op until concrete provider DELETEs (E2B sandboxes, OpenAI vector stores, GCS slide assets, Composio profiles, Stripe customers) are wired.
Storage reaper — storage_reaper.py reaps orphan UserAsset blobs (no SessionAsset link, not public, older than SESSIONS_STORAGE_REAPER_MIN_AGE_SECONDS).
ORM rails — orm_guards.py::register_purge_guards() defines a before_insert listener that enforces I3 (is_purging gate) at the ORM layer. Exported but not yet wired into app/lifespan.py — listed in §0.0 pre-flip checklist.
Cleanup-loop integration — cleanup_loop_stage_purge_sessions() and cleanup_loop_stage_storage_reaper() slot between _pause_stale_sandboxes and _cleanup_docker_zombies in agents/sandboxes/orphan_cleanup.py.
Configuration — core/config/sessions.py::SessionsSettings (env prefix SESSIONS_). Defaults:
- purge_enabled=False ✋ ships dark
- storage_reaper_enabled=False ✋ ships dark
- provider_cleanup_enabled=True
- purge_grace_period_seconds=2_592_000 (30 d), ephemeral_purge_grace_period_seconds=3_600
- purge_max_seconds_per_loop=30, purge_max_attempts=5
- purge_claim_timeout_seconds=600, heartbeat_interval_seconds=120

Tests — `src/tests/unit/sessions/purge/` (22 passed, 32 skipped)

test_purge_contracts.py — 22 contract tests passing today (types, exceptions, invariant identity, SARRequest validators). 32 PR-E/F/G behavioural skips (claim arbitration, dead-letter retention, ALREADY_PURGED idempotency, phase-(c) re-check, SAR intake, restore-during-SAR, etc.) tracked for follow-up.
test_doc_stub_parity.py — every public symbol in purge/__init__.py::__all__ is referenced by name in the design doc; doc-named symbols exist in the package.

Bug fixes vs initial draft

ID	Fix	Where
B1	Vanished-row case returns `PurgeOutcome.ALREADY_PURGED` (I19), not `SKIPPED_RESTORED`. Specific-id invocations precheck `application_events` for the canonical event type	`commit.py`, `session_purge.py`
B2	Single canonical `_AUDIT_EVENT_TYPE = "session.purge_committed"` — replaces a legacy mapping that emitted `session.purged_by_user` / `session.purged_by_grace` and broke the documented contract	`commit.py`
B3	`ExhaustedRetriesError(message, *, dead_letter_count=0)` carries the count; `providers.py` populates it; `session_purge.py` propagates it into `PurgeResult` so logs/metrics reflect reality	`exceptions.py`, `providers.py`, `session_purge.py`
B4	`sessions/__init__.py` now imports `purge.db_models` so `PurgeDeadLetter` registers with `Base.metadata` at import time — was missing, would have made the table invisible to autogenerate	`sessions/__init__.py`

Design doc — `docs/design-docs/session-lifecycle-and-data-custody.md` (v3.11)

New §0.0 Rollout gate: review-request matrix, 10-item pre-flip checklist (review approval, §8 decisions, PR-C FKs, ≥1 real cleanup hook, register_purge_guards wired, PR-E behavioural tests unblocked, canary cycle, PITR drill, observability, backup retention), reversibility envelope, three-signature sign-off block per environment with named rollback owner.
§0 banner amended: "The flag MUST NOT be flipped until §0.0 has been signed off by the core team. Wiring complete ≠ approved-to-ship."
§5 step 6 cross-references §0.0 as the irreversible boundary; steps 1–5 (schema, indexes, FKs, dead-letter table, backfill) remain zero-risk and may proceed independently.

Verified runtime evidence (live local DB after rebuild)

Migration applied: alembic_version = 20260427_000008 ✅
Schema landed: all new columns + purge_dead_letter table present ✅
Code wired: cleanup_loop_stage_purge_sessions() reachable from the orphan-cleanup loop ✅
Flag default observed: purge_enabled=False; no SESSIONS_PURGE_ENABLED in docker/.stack.env.local ✅
Audit/dead-letter activity: application_events.event_type='session.purge_committed' count = 0; purge_dead_letter count = 0 ✅ (driver dormant by design)
Backlog still in place: 1 970 soft-deleted sessions awaiting the gated flip — exactly as specified

Quality gates

mypy --strict src/ii_agent/sessions/purge/ src/ii_agent/core/config/sessions.py → Success: no issues found in 15 source files
ruff check + ruff format --check on all touched paths: clean
pytest src/tests/unit/sessions/purge/ -q → 22 passed, 32 skipped (PR-E/F/G behavioural)

What still needs to happen before the flag flips

Recorded in the §0.0 pre-flip checklist; key items:

Core-team review of design doc + purge/ package
PR-C FK constraints (otherwise §3.1 CASCADE rationale is unenforced)
≥1 real register_cleanup_hook so phase (b) is not a permanent no-op
register_purge_guards() wired into app/lifespan.py
PR-E behavioural tests unblocked against real DB fixtures
Canary cycle on a non-prod env with measurable purge_committed audit row delta
PITR drill rehearsed; backup retention ≥ 37 days

Diff vs prior PR tip: 26 files changed, +868 / −182 (modified) + 19 new files.

Latest commit — `9ba1240` `sessions/purge: harden invariant subsystem with three-tier enforcement`

Promotes the runtime purge invariants from prose into a self-validating contract.

Schema (migration 20260429_000011_invariant_hardening):

CHECK constraints for I1 / I1b (state machine)
partial UNIQUE index for I19 (provider, resource_id)
BEFORE DELETE trigger on users (I14)
discriminator columns: users.is_purging_set_at, application_events.stripped_at
partial covering indexes for invariant probes

Code:

invariants.py: rewritten into three disjoint tiers — SCHEMA_ENFORCED (4) / DB_CHECKABLE (9) / STRUCTURAL_TEST_ENFORCED (6); fixes I2 docstring and I17 catalogue entry
reconcile_providers.py (new): I9 OpenAI Files audit job — correct column names, idempotent insert via WHERE NOT EXISTS, scoped to provider = 'openai'
check_runner.py: assert_cleanup_uses_primary_db sentinel for I17 (replica-engine guard)
app/lifespan.py: I17 startup gate wired in at step 4a-bis
pii_strip.py / user_purge.py: write discriminator timestamps
workers/cron/tasks.py: daily run_purge_invariants_check APScheduler job

Tests:

test_purge_structural_invariants.py (new): Tier 3 pinning + strong-form parity test that resolves cited test artefacts (catches catalogue drift mechanically)
test_reconcile_providers.py (new): 5 unit tests pinning the audit-job correctness fixes
test_doc_stub_parity.py: extended to validate the tier union

Docs:

session-lifecycle-and-data-custody.md §2.3: rewritten with Tier column, kept in sync with invariants.py catalogue

Convergence: Three review passes complete (4 → 3 → 0 defects, strictly-decreasing severity: runtime → catalogue-drift → none). 368 sessions+realtime+app unit tests pass; lint clean; migration auto-applied on backend startup verified locally against the running stack.

Latest commit — `fa26339` `local-dev usability + two silent-failure agent fixes`

Local multi-user dev login

core/config/settings.py: new DevUserConfig + Settings.dev_users (JSON env DEV_USERS); validates username charset and PIN length.
auth/router.py: GET /auth/dev/users chooser endpoint; POST /auth/dev/login now takes {username, pin}, looks up the named user, validates PIN with constant-time compare, per-username rate limit + sleep-throttle on failures, generic error either way. Each named user maps to dev+<username>@localhost for full session/credit isolation between household members.
frontend/login.tsx: dropdown chooser + PIN input replaces the single "Dev login" button; only rendered when /auth/dev/users reports enabled: true.
frontend/utils.ts: getFirstCharacters made resilient to punctuation / unicode / empty tokens so avatar initials don't break for dev display names.
docker/.stack.env.local.example: DEV_USERS placeholder + documentation.

Agent runtime resilience

agents/models/anthropic/claude.py: stop synthesising redacted_thinking from plaintext reasoning_content — Anthropic validates redacted_thinking.data as an opaque ciphertext they issued, so plaintext triggers a non-retriable 400 (Invalid data in redacted_thinking block) that permanently bricks replay of the conversation. The block is now dropped with a WARNING; regular text/tool_use content of the assistant message survives. (Triage: session 9785de09, 2026-05-11.)
agents/inner_loop.py: detect empty A2A turn (ASSISTANT_TURN_START → ASSISTANT_TURN_END with no content, reasoning, tool call, or session.error — e.g. Copilot CLI when quota-exhausted) and raise ModelProviderError instead of completing the run with an empty response. Outer fallback path can now retry on native or surface the failure to run status.
Regression tests pinned for both fixes (test_inner_loop.py, test_v1_models_anthropic_claude.py).

Housekeeping

scripts/stack_control.sh: suppress spurious [timed out] annotation on AVAILABLE pool slots whose lifetime is governed by retire_at, not timeout_at (the R6 reaper explicitly excludes them).
.github/copilot-instructions.md: drop stale --local hint; document stack_control.sh verify for "which containers need rebuild" introspection.

Lint clean (ruff check + ruff format --check) on all changed Python files.

- Docker Compose local stack with PostgreSQL, Redis, MinIO, sandbox - Local sandbox entrypoint, VNC, browser automation services - Stack control scripts (stack_control.sh, local/*) - Backend Dockerfile + entrypoint for local development - Configuration: .stack.env.local, settings.yaml, model_configs - SQLAlchemy model fixes (UUID consistency, TimestampColumn) - Agent tool/runtime improvements (reasoning_content, field renames) - Credit billing_enabled toggle + usage handler refactor - E2B sandbox management, VNC URL support - 246 tests (unit, integration, smoke, E2E) - Documentation: architecture, getting-started, local-docker-sandbox - GitHub Copilot instructions and prompt templates

- A2A protocol: adapter server, backends (Copilot, Claude Code, Codex) - Agent inner loop: strategy pattern, tool bridge, routing - A2A billing: backend-aware credit calculation, provider-reported strategies - Circuit breaker, event stream adapter, multimodal support - Agent factory: inner loop strategy builder, converter - Health endpoint: A2A mode fields - CreditUsageHandler: A2A billing strategies - Config: A2A agent settings (inner_loop_mode, a2a_backend, billing) - 26 A2A agent tests + 10 billing strategy tests - 17 A2A design/implementation/runtime docs

mdear · 2026-04-13T22:33:03Z

I am continuing testing on this branch feature/a2a-chat-inner-loop_3_of_3.

A2A/Copilot flow: - validate chat and agent model steering end-to-end through the Copilot runtime - harden adapter/session error handling, council fallback, and post-turn event draining Frontend and local UX: - keep multiline composer input visible on mobile and tighten settings state handling - refine local stack/build helpers and sandbox port configuration for faster iteration Quality and docs: - expand unit and E2E coverage, refresh the test plan, and capture implementation notes

mdear · 2026-04-16T03:55:03Z

This PR is 3 of a series of 3 PRs:

#198 — #198: Docker sandbox runtime, local deploy stack, session lifecycle, frontend, test overhaul (389 files)

#199 — #199: A2A inner loop strategy, backend registry, billing strategies, adapter server (74 incremental files)

#200 — #200: Chat A2A turn loop, council A2A routing, cross-authority compaction (16 incremental files)

Merge order: #198 → #199 → #200

… storage proxy fix Chat A2A image retention: - Add extract_historical_image_parts() to rehydrate prior-turn images - Integrate into adapter_server event source for multi-turn continuity - 83 multimodal unit tests, 54 adapter server tests, 60 turn loop tests Sandbox lifecycle hardening: - 6-phase orphan cleanup pipeline (soft-delete, orphan kill, stale pause, zombie removal, volume cleanup, timeout enforcement) - Per-sandbox DB isolation (R2), conditional state marking (R1), persistent timeout_at column (R6) - Alembic migration for sandbox timeout_at and FK constraints - Design docs: lifecycle assessment, accumulation root-cause analysis Storage proxy fix: - Add Content-Length header to proxy_download (was chunked-only) Frontend polish: - Mobile composer scroll-into-view, model tag theming, settings typing Test results: - Unit: 5758 passed, 0 failed, 0 errors, 0 skipped - E2E: 39 passed, 0 failed, 0 errors, 0 skipped

- Add Chat A2A adapter sidecar topology (sandbox-independent) - Add claude-opus-4-7 as system model (pricing, context, frontend) - Add A2A backend-specific timeout configuration - Add A2AAdapterUnavailableError (HTTP 503) exception - Harden sandbox orphan cleanup (R4 zombie, volume, timeout) - Enrich /health endpoint with A2A inner-loop diagnostics - Improve stack_control.sh build vs rebuild help clarity - Add startup validation for chat A2A strict mode - Add retry classification, thinking-temperature, inner-loop parity tests - Add design docs: sidecar deployment, URL resolution, billing

… compaction lock fix - Plumb agent_kind through IIAgent -> sandbox metadata -> DockerSandbox._a2a_adapter_env with AgentType enum validation (new _agent_kind_from_name helper) - Add AgentSettings.a2a_adapter_timeout_long_horizon (3600s) and a2a_adapter_long_horizon_agent_kinds ({deep_research}) overrides - Opus 4.7 adaptive thinking: drop manual thinking block (Anthropic rejects it with HTTP 400 on Opus 4.7); detect via _is_opus_4_7_or_later - Fix compaction lock leak in inner_loop: acquire + yield moved inside try so consumer aclose() cannot bypass release - stack_control.sh: add verify subcommand + sha256 manifest - CODEMAPS: refresh architecture.md and dependencies.md - Tests: 24 parametrized tests for _agent_kind_from_name, extend TestA2AAdapterEnv with long-horizon override cases, adapter/orphan/R4 tweaks

- Reorder factory priority so deferred-sandbox path always wins over static a2a_agent_url for agent sessions (prevents deep_research from hitting the 900s sidecar instead of its 3600s per-sandbox adapter) - Add regression test for factory inner loop priority - Add unit tests for _is_opus_4_7_or_later model-id detection

…er_loop_mode - Per-sandbox A2A adapter now starts only when inner_loop_mode=a2a; native-mode sandboxes save 1 host port + adapter process resources - start-services.sh requires SANDBOX_ADAPTER_ENABLED=true and an explicit SANDBOX_ADAPTER_BACKEND (no more 'simulate' fallback in production) - Chat A2A sidecar hardened to adapter-only: entrypoint:[], read_only, minimal tmpfs — no Xvfb/VNC/code-server/MCP overhead - DockerSandbox.create() skips _a2a_adapter_env() entirely in native mode so backend auth tokens (GITHUB_TOKEN, ANTHROPIC_API_KEY, OPENAI_API_KEY) do not leak into native sandbox containers - DEFAULT_EXPOSED_PORTS is now the honest base set (6 ports); adapter port is added conditionally - Add 4 new unit tests: native gating, a2a port+env+token forwarding, port-count requirement difference, native-mode token-leak prevention

…alth Pool & lifecycle - Add SandboxPoolManager with 2 standby slots, claim/replenish/retire state machine, retire-on-age, dedupe, slot validation, and reap_stuck_initializing for crash recovery (pool.py + migration 20260422_000006_sandbox_prewarm_pool). - Wire reap_stuck_initializing into the orphan_cleanup loop unconditionally (pure DB UPDATE, must run even when host monitor is WARN/CRIT — ensure_full skips on WARN, so stuck rows accumulated indefinitely otherwise). - Fix QueuePool self-deadlock between caller's open TX and set_timeout's separate session: commit before set_timeout in service.init_sandbox; wrap docker._persist_deadline in asyncio.wait_for(timeout=10s); set idle_in_transaction_session_timeout=60000 in core/db/base. - R1/R2 hardening in orphan_cleanup: only mark sandbox DELETED after Docker container removal is confirmed; per-sandbox DB session so one failure doesn't roll back the batch. - R6 persistent timeout via AgentSandbox.timeout_at; cleanup loop enforces deadline as fallback. - Distributed Redis advisory lock (sandbox:cleanup:lock) for cleanup loop in multi-worker deployments. Host monitor & circuit breaker - New host_monitor.py: /proc/buddyinfo, pagetypeinfo, vmstat, meminfo parsers + percentile-baseline evaluator (BOOTSTRAP/OK/WATCH/WARN/ CRIT) backed by 48h ring buffer. - New breaker.py: Docker-call circuit breaker. - New executor.py: bounded thread pool for Docker SDK calls. - New /health/host and /health/sandbox-pool endpoints. Platform health & ops - scripts/stack_control.sh: add status --json, status --all, and modular platform_checks_*.sh (common/wsl/ubuntu/backend/pool). - /etc/sysctl.d drop-in (scripts/99-ii-agent.conf) for WSL2 host tuning. - Docker container hardening: read_only=True + tmpfs, cap_drop=ALL, no-new-privileges, mem_limit=3GB, pids_limit=512. Tests - 4 new POOL e2e tests + 2 HOST e2e tests in scripts/local/test_e2e.py. - Fix SBOX-06: replace removed AppConfig import with Settings, harden parser with regex against log interleaving. - New unit tests: sandbox_pool, sandbox_breaker, sandbox_create_ semaphore, host_monitor + integration, health_host_endpoint, health_sandbox_pool_endpoint, plus pool e2e suite. Docs - Design docs: sandbox-prewarm-pool, sandbox-pool-claim-self-deadlock, sandbox-shared-bridge-network, stack-control-platform-health, sandbox-lifecycle-assessment update, a2a-copilot-vision-support- briefing. - Runtime docs: docker-wsl2-recovery, host-resource-monitoring, wsl2-host-configuration, sandbox-networking-design, post-reboot-followups. - Impl tracker: sandbox-robustness-impl-tracker.

…t, and lazy MCP retry - Add `mcp_configured` flag to AgentSandbox model + migration (20260425_000007) - Sandbox service: MCP handoff waits for mcp_configured=True before releasing slot - Pool: expose /health/sandbox-pool endpoint with slot occupancy + MCP readiness - noVNC URL decoration for register_port tools (sandbox + dev variants) - New novnc.py helper for URL decoration logic - MCP factory: lazy retry wrapper (lazy_retry.py) for transient MCP connection failures - Docker shell framing fixes; docker_shell.py correctness improvements - orphan_cleanup: per-item DB isolation (R2), conditional state marking (R1) - Claude model: extended thinking + vision support improvements - A2A turn loop: Copilot backend fixes; fallback billing event - health.py: /health/ready endpoint; exception handler improvements - lifespan: startup validation for A2A chat strict mode - scripts: stack_control.sh enhancements; test_e2e.py expanded coverage; Windows port-forward script - docs: sandbox-pool-claim-mcp-handoff audit, postgres recovery mode runbook - tests: 10 new unit test files covering sandbox, noVNC, MCP handoff, health, storage, middleware - .gitignore: exclude .e2e_last_results.json; remove tracked copy

Skill copy_skill_to_sandbox previously ran mkdir/unzip/chown/chmod as user="root". /workspace is owned by user:user 755 (uid 1001), so root escalation was unnecessary AND harmful: root-owned files broke subsequent user-mode cleanup with Permission denied on retries. Code changes: - agents/skills/storage.py: drop user="root" from mkdir/unzip/chmod; remove the now-redundant chown -R; switch zip cleanup to rm -f for retry safety. Add inline notes documenting the ownership invariant. - settings/skills/storage.py: delete the duplicated copy_skill_to_sandbox + resolve_storage_uri + create_skill_zip_from_dir helpers; this module now owns only the GCS half of the pipeline. The agents/skills/storage copy is the canonical implementation. - agents/sandboxes/base.py: document the user= parameter contract (Docker honours via exec_run; E2B best-effort) and explicitly warn callers against using it as a security boundary. - agents/sandboxes/e2b.py: forward user= through to the E2B SDK when set. Docs: - docs/design-docs/sandbox-filesystem-design.md: new authoritative spec for /workspace ownership, write paths (put_archive cannot target /tmp on read_only=True containers per moby/moby#42333), and skill deployment. - AGENTS.md / CLAUDE.md: link the new spec; add the three governing rules (workspace-only uploads, never user=root under /workspace, root reserved for system commands). Unrelated drive-bys captured in the same commit: - docs/design-docs/session-lifecycle-and-data-custody.md: proposal v3.1 for review. - docs/runtime-docs/crossnote-pdf-export-tmpdir.md: WSL/Ubuntu snap Chromium ERR_FILE_NOT_FOUND fix for MPE PDF export.

Implements §4.1 (three-phase purge driver) and §4.6 (storage reaper) from docs/design-docs/session-lifecycle-and-data-custody.md, behind dual feature flags SESSIONS_PURGE_ENABLED and SESSIONS_STORAGE_REAPER_ENABLED. Both default to false; the cleanup-loop stage is wired but dormant in production. Schema (migration 20260427_000008): - sessions.purge_after / purge_attempts / purge_started_at + partial idx - users.is_purging gate - purge_dead_letter table for operator-facing leaked-resource records Runtime (src/ii_agent/sessions/purge/): - claim/pii_strip/commit phases, single arbitration point in session_purge - provider hook registry (empty until real DELETEs are wired) - storage_reaper for orphan UserAsset blobs - orm_guards exporting register_purge_guards (not yet called from lifespan) Cleanup-loop integration: - cleanup_loop_stage_purge_sessions + cleanup_loop_stage_storage_reaper slot between _pause_stale_sandboxes and _cleanup_docker_zombies Tests (src/tests/unit/sessions/purge/): - 22 contract tests passing; 32 PR-E/F/G behavioural tests skipped pending bodies (mypy --strict + ruff clean across the package) Doc: - §0.0 rollout gate added: review-request matrix, 10-item pre-flip checklist, sign-off block. Flag MUST NOT flip without core-team approval - §0 PR-E row notes register_purge_guards exported but not yet wired - §5 step 6 cross-references §0.0 as the irreversible boundary Bug fixes vs initial draft: B1+B2 commit.py — vanished-row case returns ALREADY_PURGED (I19); single canonical _AUDIT_EVENT_TYPE='session.purge_committed' B3 ExhaustedRetriesError carries dead_letter_count; session_purge propagates it into PurgeResult B4 sessions/__init__.py imports purge.db_models so PurgeDeadLetter registers with Base.metadata at startup

Implements GDPR Art. 17 SAR purge + grace-window cleanup with flag-gated three-phase commit (strip → orphan-purge → session DELETE). Mutation gating (I3/I8 §16): - NotPurgingDep applied to 12 mutating endpoints across sessions/, pin/, and wishlist/ routers — closes the PATCH/fork/legacy-restore hole that could race the purge driver. Invariant runner: - New check_runner + scripts/local/check_purge_invariants.py exercising 19 DB-checkable invariants (I1–I19); structural-only checks marked SKIP rather than failing. - Runner now rolls back AsyncSession after per-invariant exceptions so one bad query no longer cascades into 7+ ERRORs. - I11 rewritten from a content-key denylist to the real strip discriminator (user_id NULL + orphaned session_id + non-allowlist content key). Eliminates ~1,236 false positives. Audit trail: - PURGE_COMMITTED_EVENT_TYPE constant centralised in purge/types.py; consumed by commit.py and session_purge.py. - application_events.session_id intentionally retains no FK so audit rows survive session DELETE as forensic breadcrumbs (migration 20260428_000010). Other: - Storage reaper, OpenAI provider hooks, ORM guards, canary e2e test, PITR-restore runbook, and implementation tracker. - Design doc drift fixes (§14.4, §16) in session-lifecycle-and-data-custody. Verification: mypy --strict clean across changed files; runner reports 11 PASS / 0 FAIL / 0 ERROR / 8 structural-skip; 24/24 unit tests pass.

mdear · 2026-04-29T00:29:43Z

Session-purge hardening — pre-flip blockers cleared

Pushed as f16328f on top of the existing branch.

What landed

Area	Change
Mutation gating (I3/I8 §16)	`NotPurgingDep` applied to 12 mutating endpoints across `sessions/router.py`, `sessions/pin/router.py`, `sessions/wishlist/router.py` — closes the PATCH/fork/legacy-restore race against the purge driver.
Invariant runner	New `check_runner` + `scripts/local/check_purge_invariants.py` exercising 19 DB-checkable invariants (I1–I19). Runner now `rollback()`s after per-invariant exceptions so one bad query no longer cascades into 7+ ERRORs.
I11 correctness	Rewritten from a content-key denylist to the real strip discriminator: `user_id IS NULL ∧ orphaned session_id ∧ non-allowlist content key`. Eliminates ~1,236 false positives previously surfaced against static UI strings.
Audit constant	`PURGE_COMMITTED_EVENT_TYPE` centralised in `purge/types.py`; consumed by `commit.py` and `session_purge.py` (single source of truth).
Forensic breadcrumb	`application_events.session_id` intentionally retains no FK so audit rows survive session DELETE (migration `20260428_000010`).
Other	Storage reaper, OpenAI provider hooks, ORM guards, canary e2e test, PITR-restore runbook, implementation tracker, design-doc drift fixes (§14.4, §16).

Verification

mypy --strict — clean across all changed files (19 source files).
Invariant runner against live DB: 11 PASS / 0 FAIL / 0 ERROR / 8 structural-skip in 0.24s (vs. pre-fix: 1 FAIL → first-attempt regression: 7 ERROR → now: all green).
Unit tests: 24 passed, 32 skipped (skipped are pre-existing PR-E/F/G structural stubs), 0 failures.

Still open (tracked, not blockers for flag-default-off)

7 §14.4 structural test files still stubbed (tracker §4.3).
Prometheus exporter for invariant pass/fail metrics is aspirational (tracker §4.2/§4.2a).
12 system.error rows in audit trail warrant a hand audit before prod flag-flip.
7 pre-existing mypy errors in sessions/router.py (untyped dict returns, AppKind re-export) — confirmed via stash baseline as not introduced by this change; tracked separately.

Promote runtime purge invariants from prose into a self-validating contract with mechanical artefact checks. Schema (migration 20260429_000011): - CHECK constraints for I1 / I1b (state machine) - partial UNIQUE index for I19 (provider/resource_id) - BEFORE DELETE trigger on users (I14) - discriminator columns: users.is_purging_set_at, application_events.stripped_at - partial covering indexes for invariant probes Code: - invariants.py: rewrite into three disjoint tiers — SCHEMA_ENFORCED (4), DB_CHECKABLE (9), STRUCTURAL_TEST_ENFORCED (6); fixes I2 docstring and I17 catalogue entry - reconcile_providers.py: I9 OpenAI Files audit job (correct column names, idempotent insert via WHERE NOT EXISTS, scoped to provider 'openai') - check_runner.py: assert_cleanup_uses_primary_db sentinel for I17 - lifespan.py: I17 startup gate at step 4a-bis - pii_strip.py / user_purge.py: write discriminator timestamps - workers/cron/tasks.py: daily run_purge_invariants_check job Tests: - test_purge_structural_invariants.py: Tier 3 pinning + strong-form parity test that resolves cited test artefacts - test_reconcile_providers.py: 5 unit tests pinning the audit-job fixes - test_doc_stub_parity.py: validates tier union Docs: - session-lifecycle-and-data-custody.md §2.3: rewritten with Tier column, in sync with invariants.py catalogue 368 sessions+realtime+app unit tests pass; lint clean; migration auto-applied on backend startup verified locally.

APScheduler uses CLOCK_MONOTONIC for wake-ups. On hypervisor guests (WSL2, KVM laptops, etc.) the host can suspend the VM's vCPU, freezing that clock; when it thaws, every fire scheduled during the gap is reported 'missed by N min' and silently dropped (default grace = 1s). This was causing the new daily lifecycle-invariants probe to risk skipping a full day per missed window, and was already dropping the 40-min cleanup jobs in development. Detect host class (env override, /proc/version for WSL2, hypervisor flag in /proc/cpuinfo) and apply tighter grace on bare metal (60s) or generous grace on VMs (1h default; 6h for the 24h invariants probe), with coalesce=True everywhere so a backlog collapses to one catch-up run. Detection result is logged at scheduler start. Verified end-to-end on this WSL2 host: backend rebuilt, scheduler launched 3 jobs as host_class=vm, both 40-min cleanup jobs fired and completed at 04:05 UTC with zero misfire warnings. 13 unit tests cover detection edge cases and per-job grace assignment.

mdear · 2026-04-30T14:33:49Z

Follow-up commit: `ef22b43` — host-aware cron misfire tuning

Posting a brief note on this commit since it landed after the main review window and is small but operationally meaningful.

What it does

Tunes APScheduler's misfire_grace_time / coalesce settings on the cron registered in src/ii_agent/workers/cron/tasks.py (the same file where the new daily lifecycle-invariants probe lives) based on detected host class:

Bare metal → tight 60 s default grace, 30 min on the daily probe.
VM / hypervisor guest (WSL2, KVM laptops, etc.) → 1 h default grace, 6 h on the daily probe; coalesce=True everywhere.

Detection order: IIA_CRON_HOST_CLASS env override → microsoft|wsl in /proc/version → hypervisor flag in /proc/cpuinfo → bare metal.

Why

APScheduler's AsyncIOScheduler schedules wake-ups against loop.time() (CLOCK_MONOTONIC). When a hypervisor host suspends the guest's vCPU (laptop sleep, WSL2 going idle, host hibernate), CLOCK_MONOTONIC freezes for the duration. On thaw, every fire scheduled during the gap is reported Run time of job ... was missed by N min and — with the default misfire_grace_time = 1 s — silently dropped.

This was empirically observed on a WSL2 dev box: every 40 min cleanup fire from cleanup_long_running_tasks / cleanup_long_running_chat_messages was being missed by 3–4 min and dropped (~99% loss), and the new 24 h run_purge_invariants_check would have been at risk of skipping a full day per gap. Production Linux servers don't suspend, so the bare-metal branch keeps the tight grace there to surface real scheduler stalls.

Why this is summarized rather than verbose

The change is mechanically narrow (one factory function + per-job kwargs in tasks.py, plus 11 new unit tests in test_scheduler_tasks.py). The rationale, the failure mode, and the bare-vs-VM split are documented inline at the top of tasks.py and at each per-job override, so future readers don't need to chase the commit message. No behavioural change on production servers — they detect as bare, default grace stays 60 s. No public API surface, no schema migration, no config required (env override is opt-in only).

Verification

13/13 unit tests pass; ruff clean.
End-to-end on this WSL2 host: backend rebuilt + restarted, scheduler logs host_class=vm, reason=WSL2 detected via /proc/version, default_misfire_grace_time=3600s, coalesce=True. Both 40 min cleanup jobs fired at 04:05 UTC (≈4.5 min after their scheduled 04:00:44 UTC slot), completed successfully, and produced zero missed by / misfire warnings — versus consistent misses on every fire prior to the change.

Files

src/ii_agent/workers/cron/tasks.py (+109/-3): _detect_host_class(), _JOB_DEFAULTS, AsyncIOScheduler(job_defaults=...), per-job misfire_grace_time for the invariants probe, scheduler-startup log.
src/tests/unit/scripts/test_scheduler_tasks.py (+157/-1): coverage for env override (valid/invalid), WSL2 detection, hypervisor flag detection, bare-metal default, missing /proc files, substring false-match guard, per-host invariants-probe grace.

mdear · 2026-04-30T14:34:03Z

Follow-up commit: `ef22b43` — host-aware cron misfire tuning

Posting a brief note on this commit since it landed after the main review window and is small but operationally meaningful.

What it does

Tunes APScheduler's misfire_grace_time / coalesce settings on the cron registered in src/ii_agent/workers/cron/tasks.py (the same file where the new daily lifecycle-invariants probe lives) based on detected host class:

Bare metal → tight 60 s default grace, 30 min on the daily probe.
VM / hypervisor guest (WSL2, KVM laptops, etc.) → 1 h default grace, 6 h on the daily probe; coalesce=True everywhere.

Detection order: IIA_CRON_HOST_CLASS env override → microsoft|wsl in /proc/version → hypervisor flag in /proc/cpuinfo → bare metal.

Why

APScheduler's AsyncIOScheduler schedules wake-ups against loop.time() (CLOCK_MONOTONIC). When a hypervisor host suspends the guest's vCPU (laptop sleep, WSL2 going idle, host hibernate), CLOCK_MONOTONIC freezes for the duration. On thaw, every fire scheduled during the gap is reported Run time of job ... was missed by N min and — with the default misfire_grace_time = 1 s — silently dropped.

This was empirically observed on a WSL2 dev box: every 40 min cleanup fire from cleanup_long_running_tasks / cleanup_long_running_chat_messages was being missed by 3–4 min and dropped (~99% loss), and the new 24 h run_purge_invariants_check would have been at risk of skipping a full day per gap. Production Linux servers don't suspend, so the bare-metal branch keeps the tight grace there to surface real scheduler stalls.

Why this is summarized rather than verbose

The change is mechanically narrow (one factory function + per-job kwargs in tasks.py, plus 11 new unit tests in test_scheduler_tasks.py). The rationale, the failure mode, and the bare-vs-VM split are documented inline at the top of tasks.py and at each per-job override, so future readers don't need to chase the commit message. No behavioural change on production servers — they detect as bare, default grace stays 60 s. No public API surface, no schema migration, no config required (env override is opt-in only).

Verification

13/13 unit tests pass; ruff clean.
End-to-end on this WSL2 host: backend rebuilt + restarted, scheduler logs host_class=vm, reason=WSL2 detected via /proc/version, default_misfire_grace_time=3600s, coalesce=True. Both 40 min cleanup jobs fired at 04:05 UTC (≈4.5 min after their scheduled 04:00:44 UTC slot), completed successfully, and produced zero missed by / misfire warnings — versus consistent misses on every fire prior to the change.

Files

src/ii_agent/workers/cron/tasks.py (+109/-3): _detect_host_class(), _JOB_DEFAULTS, AsyncIOScheduler(job_defaults=...), per-job misfire_grace_time for the invariants probe, scheduler-startup log.
src/tests/unit/scripts/test_scheduler_tasks.py (+157/-1): coverage for env override (valid/invalid), WSL2 detection, hypervisor flag detection, bare-metal default, missing /proc files, substring false-match guard, per-host invariants-probe grace.

@localhost

Local multi-user dev login - core/config/settings.py: new DevUserConfig + Settings.dev_users (JSON env DEV_USERS); validates username charset and PIN length. - auth/router.py: GET /auth/dev/users chooser endpoint; POST /auth/dev/login now takes {username, pin}, looks up the named user, validates PIN with constant-time compare, per-username rate limit + sleep-throttle on failures, generic error message. Each named user maps to dev+<username>@localhost for full session/credit isolation between household members. - frontend/login.tsx: chooser dropdown + PIN input replaces the single "Dev login" button; only shown when /auth/dev/users reports enabled. - frontend/utils.ts: getFirstCharacters resilient to punctuation / unicode / empty tokens so avatar initials don't break for dev names. - docker/.stack.env.local.example: DEV_USERS placeholder + docs. Agent runtime resilience - agents/models/anthropic/claude.py: stop synthesizing redacted_thinking from plaintext reasoning_content — Anthropic rejects non-issued blobs with a non-retriable 400 that bricks replay. Drop the block with a warning; text/tool_use survive. (Triage: session 9785de09, 2026-05-11.) - agents/inner_loop.py: detect empty A2A turn (no text, reasoning, tool call, or session.error) and raise ModelProviderError instead of silently completing. Surfaces quota-exhausted Copilot CLI failures to the user / fallback path. - Tests pinned for both fixes. Housekeeping - scripts/stack_control.sh: suppress spurious "[timed out]" annotation on AVAILABLE pool slots whose lifetime is governed by retire_at. - .github/copilot-instructions.md: drop stale --local hint; document `stack_control.sh verify`. Lint clean on changed Python files.

…imers Long deep_research turns were tripping the single fixed per-turn wall-clock cap and falling back to the native (billed) Anthropic provider mid-task even though the Copilot backend was still productively streaming events. Splits the watchdog in copilot_backend.py into two independent timers: - absolute: hard wall-clock safety net (defaults 300s -> 1800s; long horizon 3600s -> 7200s) - activity: max idle time with no SDK events; resets on every non-heartbeat event (defaults 600s; long horizon 900s) Surface area: - core/config/agent.py: new a2a_adapter_activity_timeout_long_horizon setting; tightened docstring on the absolute long-horizon setting - integrations/a2a/adapter_server.py: read A2A_*_ACTIVITY_TIMEOUT env vars and pass through to CopilotConfig - agents/sandboxes/docker.py: forward A2A_*_ACTIVITY_TIMEOUT vars to sandbox containers, honouring the long-horizon override for research-class agents - docker/docker-compose.local.yaml: expose the new env vars on the a2a-adapter sidecar with matching defaults - tests: cover the new env wiring on both the docker sandbox and copilot backend layers Also in this commit: - e2b.Dockerfile: bump GH_CLI_VERSION 2.91.0 -> 2.92.0 (2.91.0 was rolled out of the apt repo, breaking sandbox rebuilds) - .gitignore: ignore build-manifest-*.json (generated per-build by scripts/stack_control.sh and COPY'd into each image; was being flagged as untracked after every build)

mdear · 2026-05-12T16:12:41Z

Follow-up commit: `b1867dc` — adapter timeout split + sandbox build fixes

a2a/copilot: split per-turn timeout into absolute + activity (idle) timers

Long deep_research turns were tripping the single fixed per-turn wall-clock cap and falling back to the native (billed) Anthropic provider mid-task even though the Copilot backend was still productively streaming events.

Splits the watchdog in copilot_backend.py into two independent timers:

Timer	Purpose	Default	Long-horizon
absolute	hard wall-clock safety net	300s → 1800s	3600s → 7200s
activity	max idle with no SDK events; resets on every non-heartbeat event	— → 600s	— → 900s

The activity timer is the real is the backend stuck? signal; the absolute timer is now just a forgiving safety net so productive long turns never get killed.

Surface area

core/config/agent.py — new a2a_adapter_activity_timeout_long_horizon setting; tightened docstring on the absolute long-horizon setting.
integrations/a2a/adapter_server.py — read A2A_*_ACTIVITY_TIMEOUT env vars and pass through to CopilotConfig.
agents/sandboxes/docker.py — forward A2A_*_ACTIVITY_TIMEOUT vars into sandbox containers, honouring the long-horizon override for research-class agents.
docker/docker-compose.local.yaml — expose the new env vars on the a2a-adapter sidecar with matching defaults.
Tests cover the new env wiring on both the Docker sandbox and Copilot backend layers.

Native loop check: confirmed the native inner loop has no analogous per-turn watchdog — timeouts there are per-HTTP-request only (Anthropic 300s, Google 600s) and there is no turn-level wrapper that could prematurely abort a productive turn. No native-side timing changes required.

Also in this commit

e2b.Dockerfile: bump GH_CLI_VERSION 2.91.0 → 2.92.0. 2.91.0 was rolled out of the apt repo, breaking sandbox rebuilds.
.gitignore: ignore build-manifest-*.json (generated per-build by scripts/stack_control.sh and COPY'd into each backend/frontend/sandbox image; was showing as untracked after every build).

mdear added 3 commits April 13, 2026 15:22

feat: chat A2A inner loop, council routing, compaction authority (3/3)

aa2a841

mdear mentioned this pull request Apr 13, 2026

feat: local Docker sandbox + A2A inner loop with pluggable backends #196

Closed

mdear mentioned this pull request Apr 16, 2026

feat: A2A agent inner loop framework (2/3) #199

Open

mdear added 10 commits April 17, 2026 14:17

mdear added 2 commits April 29, 2026 09:57

mdear added 2 commits May 11, 2026 18:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: chat A2A inner loop, council routing, compaction authority (3/3)#200

feat: chat A2A inner loop, council routing, compaction authority (3/3)#200
mdear wants to merge 18 commits into
Intelligent-Internet:mainfrom
mdear:feature/a2a-chat-inner-loop_3_of_3

mdear commented Apr 13, 2026 •

edited

Loading

Uh oh!

mdear commented Apr 13, 2026

Uh oh!

mdear commented Apr 16, 2026

Uh oh!

mdear commented Apr 29, 2026

Uh oh!

mdear commented Apr 30, 2026

Uh oh!

mdear commented Apr 30, 2026

Uh oh!

mdear commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mdear commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Chat A2A Inner Loop, Council Routing & Compaction Authority (3/3)

Core chat A2A delivery

Chat A2A image retention

Model steering and runtime validation

Sandbox lifecycle hardening

Copilot and sandbox hardening

Storage proxy fix

Frontend and input UX

Test coverage and docs

Verified test totals

Diff stats

Update 2026-04-19 (commit 52f2682)

Update 2026-04-24 (commit 8a360bb)

Pool & lifecycle

Host monitor & circuit breaker

Platform health & ops

Tests

Docs

Update 2026-04-25 (commit 590988f)

Skill deployment under /workspace no longer escalates to root

Sandbox base API contract

Authoritative filesystem spec

Drive-bys (unrelated, low-risk docs)

Update 2026-04-27 (commit 94fb301)

Why this is in the PR

Schema delta — migrations/versions/20260427_000008_session_purge_v34.py

Runtime — src/ii_agent/sessions/purge/ (15 modules, ~2 200 LOC, mypy --strict clean)

Tests — src/tests/unit/sessions/purge/ (22 passed, 32 skipped)

Bug fixes vs initial draft

Design doc — docs/design-docs/session-lifecycle-and-data-custody.md (v3.11)

Verified runtime evidence (live local DB after rebuild)

Quality gates

What still needs to happen before the flag flips

Latest commit — 9ba1240 sessions/purge: harden invariant subsystem with three-tier enforcement

Latest commit — fa26339 local-dev usability + two silent-failure agent fixes

Uh oh!

mdear commented Apr 13, 2026

Uh oh!

mdear commented Apr 16, 2026

Uh oh!

mdear commented Apr 29, 2026

Session-purge hardening — pre-flip blockers cleared

What landed

Verification

Still open (tracked, not blockers for flag-default-off)

Uh oh!

mdear commented Apr 30, 2026

Follow-up commit: ef22b43 — host-aware cron misfire tuning

Uh oh!

mdear commented Apr 30, 2026

Follow-up commit: ef22b43 — host-aware cron misfire tuning

Uh oh!

mdear commented May 12, 2026

Follow-up commit: b1867dc — adapter timeout split + sandbox build fixes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mdear commented Apr 13, 2026 •

edited

Loading

Update 2026-04-19 (commit `52f2682`)

Update 2026-04-24 (commit `8a360bb`)

Update 2026-04-25 (commit `590988f`)

Skill deployment under `/workspace` no longer escalates to root

Update 2026-04-27 (commit `94fb301`)

Schema delta — `migrations/versions/20260427_000008_session_purge_v34.py`

Runtime — `src/ii_agent/sessions/purge/` (15 modules, ~2 200 LOC, mypy `--strict` clean)

Tests — `src/tests/unit/sessions/purge/` (22 passed, 32 skipped)

Design doc — `docs/design-docs/session-lifecycle-and-data-custody.md` (v3.11)

Latest commit — `9ba1240` `sessions/purge: harden invariant subsystem with three-tier enforcement`

Latest commit — `fa26339` `local-dev usability + two silent-failure agent fixes`

Follow-up commit: `ef22b43` — host-aware cron misfire tuning

Follow-up commit: `ef22b43` — host-aware cron misfire tuning

Follow-up commit: `b1867dc` — adapter timeout split + sandbox build fixes