Skip to content

feat: local Docker sandbox + A2A inner loop with pluggable backends#196

Closed
mdear wants to merge 3 commits into
Intelligent-Internet:mainfrom
mdear:rebase/local-docker-sandbox
Closed

feat: local Docker sandbox + A2A inner loop with pluggable backends#196
mdear wants to merge 3 commits into
Intelligent-Internet:mainfrom
mdear:rebase/local-docker-sandbox

Conversation

@mdear
Copy link
Copy Markdown

@mdear mdear commented Apr 12, 2026

Summary

Local Docker sandbox runtime and A2A inner loop framework with pluggable backends for self-hosted deployments.

Local Docker Sandbox

  • DockerSandbox provider with shell executor and port pool manager
  • Orphan container cleanup with configurable TTL and async-safe threading
  • Docker compose local stack with stack_control.sh tooling (build-sandbox, patch-sandbox, cleanup, setup)
  • e2b.Dockerfile: gh CLI v2.89+ installed, adapter deps included
  • Storage proxy router for MinIO-backed local file serving
  • Frontend: local sandbox support in workspace state and UI

A2A Inner Loop Framework

  • A2AInnerLoop strategy with SSE streaming client
  • CircuitBreaker (threshold=5) with automatic native fallback
  • EventStreamAdapter mapping SSE events to agent runtime events
  • ContextAdapter for conversation history parity with native loop
  • ToolBridge for bidirectional tool registration between backends
  • AdapterServer (FastAPI/uvicorn) running inside sandboxes on :18100
  • Backend registry: simulate, copilot, claude_code, codex

Chat A2A Turn Loop

  • A2ATurnLoopService for chat mode with event translation layer
  • Routes chat requests through A2A adapter (Copilot) in local mode
  • A2A client singleton with URL tracking and auto-refresh on sandbox change
  • Council service with parallel LLM execution and synthesis via A2A
  • Fix council text doubling: separate delta collection from full_content
  • Fix async coroutine bug in file_processor, vectorstore, and LLM providers
    (await get_storage().read() instead of anyio.to_thread.run_sync)

Copilot Backend

  • CopilotBackend using GitHub Copilot SDK (github-copilot-sdk>=0.1.25)
  • 15 native sandbox tools bridged to Copilot CLI
  • Fresh sessions per run for reliable tool availability
  • CLI path resolution: bundled SDK binary primary, gh fallback

Session Lifecycle

  • delete_after column and schedule-delete endpoint for timed session cleanup
  • Orphan sandbox cleanup with async-safe threading
  • Frontend session delete UI (sidebar + project list)
  • Alembic migration for session delete_after column

Bug Fixes

  • A2A reasoning events visible in frontend (delta_status tracking)
  • SSE stream kept open across SDK continuation turns
  • ToolInvocation TypedDict argument extraction + sandbox FOWNER
  • Council text doubling (dual accumulation of deltas + full message)
  • Image coroutine bug (get_storage().read() not awaited in 5 locations)
  • A2A client stale URL when sandbox recycled

Config & Infra

  • AgentSettings: A2A billing strategy, multipliers, inner loop mode config
  • SandboxSettings: SANDBOX_PROVIDER=docker, host config
  • StorageSettings: STORAGE_SERVE_BASE_URL for proxied URLs
  • CreditsSettings: CREDITS_BILLING_ENABLED toggle for self-hosted
  • Health endpoint reports A2A inner loop mode and backend
  • Alembic migrations: summary_authority column, session delete_after
  • Untrack docker/.stack.env.local (contains local secrets)

Testing

  • Comprehensive E2E test suite: 32 tests across 11 categories
    (INF, CHAT, IMG, WEB, CODE, SESS, AGEN, XFEAT, HIST, CNCL, A2A)
  • Content doubling detection and server error scanning in E2E
  • E2E test-cycle prompt for autonomous fix/rebuild/retest workflow
  • Unit tests: council billing, A2A turn loop, orphan cleanup
  • Repository integration tests
  • Latest run: 31/32 PASS, 1 SKIP (OpenAI quota), 0 FAIL

Docs

  • A2A billing model design doc with strategy comparison
  • Chat A2A inner loop integration assessment
  • Inner loop parity assessment and tool bridge gap analysis
  • Implementation guide updates

Local Docker sandbox runtime:
- DockerSandbox provider with shell executor and port pool manager
- Orphan container cleanup with configurable TTL
- Docker compose local stack with stack_control.sh tooling
- e2b.Dockerfile: gh CLI v2.89+ installed, adapter deps included
- Storage proxy router for MinIO-backed local file serving
- Frontend: local sandbox support in workspace state and UI

A2A inner loop framework:
- A2AInnerLoop strategy with SSE streaming client
- CircuitBreaker (threshold=5) with automatic native fallback
- EventStreamAdapter mapping SSE events to agent runtime events
- ContextAdapter for conversation history parity with native loop
- ToolBridge for bidirectional tool registration between backends
- AdapterServer (FastAPI/uvicorn) running inside sandboxes on :18100
- Backend registry: simulate, copilot, claude_code, codex

Copilot backend:
- CopilotBackend using GitHub Copilot SDK (github-copilot-sdk>=0.1.25)
- 15 native sandbox tools bridged to Copilot CLI
- Fresh sessions per run for reliable tool availability
- CLI path resolution: bundled SDK binary primary, gh fallback

Bug fixes:
- A2A reasoning events visible in frontend (delta_status tracking)
- SSE stream kept open across SDK continuation turns
- ToolInvocation TypedDict argument extraction + sandbox FOWNER

Config & infra:
- AgentSettings: AGENT_INNER_LOOP_MODE, AGENT_A2A_BACKEND, fallback
- SandboxSettings: SANDBOX_PROVIDER=docker, host config
- StorageSettings: STORAGE_SERVE_BASE_URL for proxied URLs
- CreditsSettings: CREDITS_BILLING_ENABLED toggle for self-hosted
- Alembic migration: add summary_authority column
- Untrack docker/.stack.env.local (contains local secrets)

Tests: 730+ unit tests, E2E test plan (17/23 PASS, 6 DEFERRED)
Docs: design docs, implementation guide, E2E test plan
@mdear mdear mentioned this pull request Apr 12, 2026
@mdear
Copy link
Copy Markdown
Author

mdear commented Apr 12, 2026

This rebased PR is based on #172

@mdear
Copy link
Copy Markdown
Author

mdear commented Apr 12, 2026

Coming soon : A2A replacement inner loop backends for Chat Mode

@khoangothe
Copy link
Copy Markdown
Collaborator

Hi @mdear, could you split this pr to resolve DockerSandbox first? I think it could be splitted into 3 prs, and we can work with you and resolve each individually! Thank you so much for your constant support

Chat A2A turn loop:
- Add A2A turn loop service for chat mode with event translation layer
- Route chat requests through A2A adapter (Copilot) in local mode
- A2A client singleton with URL tracking and auto-refresh on sandbox change
- Council service with parallel LLM execution and synthesis via A2A
- Fix council text doubling: separate delta collection from full_content
- Fix async coroutine bug in file_processor, vectorstore, LLM providers
  (await get_storage().read() instead of anyio.to_thread.run_sync)

Session lifecycle:
- Add delete_after column and schedule-delete endpoint for timed cleanup
- Orphan sandbox cleanup with async-safe threading
- Frontend session delete UI (sidebar + project list)

Infrastructure:
- Expand stack_control.sh (build-sandbox, patch-sandbox, cleanup, setup)
- Health endpoint reports A2A inner loop mode and backend
- Agent settings for A2A billing strategy and multipliers
- Migration for session delete_after column

Testing:
- Add comprehensive E2E test suite (32 tests, 11 categories)
  with content doubling detection and server error scanning
- Add unit tests: council billing, A2A turn loop, orphan cleanup
- Add repository integration tests
- E2E test-cycle prompt for autonomous fix/retest workflow

Docs:
- A2A billing model, inner loop parity, tool bridge gap analysis
- Chat A2A integration assessment, implementation guide
@mdear
Copy link
Copy Markdown
Author

mdear commented Apr 13, 2026

Yes, I can do that. Stand by, I'll split into three progressive PRs, each that can stand on their own, each building on the previous PRs, with appropriate unit and e2e tests accompanying each.

I'm happy to say the local model with CoPIlot inner loop appears to be stable and working well. I haven't done extensive testing yet but the proof of concept is now holding up.

First PR : Docker local and core unit test rewrite (most unit tests were written against the outdated develop branch). I held off writing many front end unit tests due to lack of technical experience, as my expertise is primarily back-end, and these features are heavily backend and sandbox weighted.

Second PR : Agentic A2A inner loop replacement

Third PR : Chat A2A inner loop replacement, council billing foundation, A2A council billing enhancements

@mdear
Copy link
Copy Markdown
Author

mdear commented Apr 13, 2026

This PR has been split into three progressive PRs for easier review. All content from this PR is covered across:

  1. feat: local Docker sandbox infrastructure (1/3) #198Local Docker sandbox infrastructure (1/3): Docker sandbox runtime, local deploy stack, session lifecycle, frontend, test overhaul (389 files)
  2. feat: A2A agent inner loop framework (2/3) #199A2A agent inner loop framework (2/3): A2A inner loop strategy, backend registry, billing strategies, adapter server (74 incremental files)
  3. feat: chat A2A inner loop, council routing, compaction authority (3/3) #200Chat A2A inner loop, council routing, compaction authority (3/3): Chat A2A turn loop, council A2A routing, cross-authority compaction (16 incremental files)

Merge order: #198#199#200

Closing this PR in favour of the split.

@mdear mdear closed this Apr 13, 2026
@mdear
Copy link
Copy Markdown
Author

mdear commented Apr 13, 2026

Churn Metrics

Diff Files Insertions Deletions Net
main1_of_3 (#198) 389 +27,707 −69,207 −41,500
1_of_32_of_3 (#199) 74 +26,596 −37 +26,559
2_of_33_of_3 (#200) 16 +6,142 −214 +5,928
Total 479 +60,445 −69,458 −9,013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants