feat: add subprocess session isolation for recipe execution by bkrabach · Pull Request #135 · microsoft/amplifier-foundation

bkrabach · 2026-03-22T17:56:08Z

Summary

Adds an opt-in subprocess runner to amplifier-foundation that allows agent sessions to run in isolated child processes. When a subprocess exits, the OS reclaims ALL memory immediately — eliminating the RSS watermark accumulation that causes OOM kills during parallel recipe execution.

Problem

During parallel recipe execution (foreach with parallel: true), multiple agent sessions run concurrently in the same Python process. Each session loads ~300-400MB of modules. Python's allocator never returns freed pages to the OS after sessions complete. After multiple waves of parallel sessions (e.g., deep discovery running topdown + bottomup + combine), a single process grows to 35-37GB RSS and gets OOM-killed.

Solution

A foundation-layer utility (subprocess_runner.py) that:

Serializes session config + prompt to a temp JSON file
Spawns a child Python process via asyncio.create_subprocess_exec
Child creates a fresh AmplifierSession, runs execute(), writes result to stdout, exits
Parent reads result, OS reclaims all child process memory
Includes max_subprocess semaphore (default 4) to prevent fork-bombing

Usage

from amplifier_foundation.subprocess_runner import run_session_in_subprocess

result = await run_session_in_subprocess(
    config=child_config,
    prompt=task,
    parent_id=parent_session.session_id,
    project_path=str(project_path),
    session_id=sub_session_id,
    timeout=1800,
)

Files

New: amplifier_foundation/subprocess_runner.py (271 lines) — config serialization, child runner, __main__ entry point, parent spawner, concurrency semaphore
New: tests/test_subprocess_runner.py (543 lines) — 21 tests covering round-trip serialization, child runner, parent spawner, error handling, timeout, concurrency
Modified: amplifier_foundation/__init__.py — exports run_session_in_subprocess

Design

Opt-in only — default behavior is unchanged (in-process spawning)
No kernel changes — uses existing AmplifierSession API
No new dependencies — stdlib only (asyncio, json, sys, tempfile)
IPC: temp JSON file (parent→child) + stdout (child→parent) + stderr (errors)
Companion PR for amplifier-app-cli wires this into session_spawner.py

Testing

21 tests covering:

Config serialization round-trip (minimal, full, without session_id)
Missing required keys and malformed JSON
Child session runner (success, cleanup on error, no session_id)
__main__ entry point validation
Parent-side subprocess management (success, non-zero exit, timeout, temp file cleanup)
Concurrency semaphore limiting

Design doc: amplifier-bundle-dot-graph/docs/plans/2026-03-21-subprocess-session-isolation-design.md

…ndle_package_paths, sys_paths)

…, add sys.path entries

…paths_added_before_initialize

…ring

Add RESULT_START_MARKER and RESULT_END_MARKER constants to create a framed envelope around subprocess result output. This prevents stray print() calls from third-party code or debug output corrupting the result payload returned to the parent process. Changes: - Add RESULT_START_MARKER = '<<<AMPLIFIER_RESULT_START>>>' constant - Add RESULT_END_MARKER = '<<<AMPLIFIER_RESULT_END>>>' constant - Add _extract_framed_result(stdout: str) -> str helper that extracts content between markers, raises RuntimeError('missing result envelope') if markers not found, and logs unframed output at DEBUG - Update run_session_in_subprocess() to decode stdout then call _extract_framed_result() instead of raw .strip() - Update __main__ block to wrap output with print(RESULT_START_MARKER), print(output, end=''), print(), print(RESULT_END_MARKER) - Update test mocks in test_success, test_temp_file_cleanup_on_success, test_passes_session_id_in_config, and test_max_concurrent_limits_parallelism to wrap mock stdout in RESULT_START_MARKER/RESULT_END_MARKER envelope - Add TestStdoutFraming tests: test_framed_output_extracted_correctly and test_unframed_output_raises_runtime_error

…urrent

- Add import re to subprocess_runner.py - Define _CREDENTIAL_PATTERNS list of 6 compiled regexes: - sk-[a-zA-Z0-9\-_]{10,} for API keys (sk- prefix) - key=\s*\S+ (case-insensitive) for key=value patterns - token=\s*\S+ (case-insensitive) for token=value patterns - secret=\s*\S+ (case-insensitive) for secret=value patterns - password=\s*\S+ (case-insensitive) for password=value patterns - Bearer\s+\S+ for Bearer token headers - Add _sanitize_error(msg: str) -> str that replaces all matches with [REDACTED] - Update run_session_in_subprocess(): log full stderr at DEBUG, sanitize for RuntimeError message with exit code in the format: 'Subprocess session failed (exit code {returncode}): {sanitized}' - Add TestErrorSanitization test class with 4 tests verifying all behaviors Fixes finding #9: stderr may leak credentials in subprocess error messages Co-authored-by: Amplifier <amplifier@example.com>

… assert permissions - Add import stat - Add _validate_project_path() that resolves path, checks is_dir(), raises ValueError('does not exist or is not a directory') if invalid - In run_session_in_subprocess(): call _validate_project_path first, move temp file creation inside try block (tmp_path: str | None = None before try), assert/enforce 0o600 permissions after write, unlink in finally only if not None - In _run_child_session(): call _validate_project_path before os.chdir() - Add TestCleanupHardening: test_nonexistent_project_path_raises, test_file_as_project_path_raises, test_valid_project_path_passes, test_parent_validates_project_path Addresses findings #10 (temp file before try), #12 (file permissions), #13 (project_path validation)

…esses

Blocker B: The __main__ block now emits a JSON envelope between framing markers on both success and error paths, enabling the parent (session_spawner) to parse structured output including status, turn_count, and metadata. Success path: result_envelope = json.dumps({ 'output': output, 'status': 'success', 'turn_count': 1, 'metadata': {} }) Error path now emits JSON envelope with 'status': 'error' and 'error' field before calling sys.exit(1). Blocker C (part 1): Added documentation comment in _run_child_session explaining why approval_system and display_system are intentionally not available in subprocess mode (live runtime objects that cannot cross process boundaries). Tests added: - TestMainJsonEnvelope.test_success_emits_json_envelope - TestMainJsonEnvelope.test_error_emits_json_envelope_with_status_error

bkrabach and others added 18 commits March 22, 2026 15:15

feat: add subprocess config serialization helpers

a72ed92

feat: add child-side session runner function

88a0e9c

feat: add child-side __main__ entry point

404d304

feat: add parent-side run_session_in_subprocess

7d35d23

feat: add subprocess concurrency semaphore

4186336

feat: export run_session_in_subprocess from foundation

88930cc

feat: expand IPC payload with bundle context fields (module_paths, bu…

9a83c7f

…ndle_package_paths, sys_paths)

fix: child bootstrap — call initialize(), reconstruct module resolver…

aa27439

…, add sys.path entries

fix: rename test_sys_paths_added_before_session_creation to test_sys_…

e691c96

…paths_added_before_initialize

docs: fix stale docstring in test_sys_paths_added_before_initialize

868b71d

refactor: merge duplicate sys.path loops and align step comment numbe…

aeb9f94

…ring

fix: polish — fix comment typo and strengthen sys.path ordering test

e938d8e

fix: harden semaphore with set-once pattern, remove per-call max_conc…

9120b25

…urrent

fix: add env var allowlist to prevent unrelated secrets in child proc…

7f26c99

…esses

bkrabach force-pushed the feat/subprocess-session-isolation branch from 65f49c4 to 7b606cd Compare March 22, 2026 22:18

bkrabach merged commit c0b047b into main Mar 22, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add subprocess session isolation for recipe execution#135

feat: add subprocess session isolation for recipe execution#135
bkrabach merged 18 commits intomainfrom
feat/subprocess-session-isolation

bkrabach commented Mar 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bkrabach commented Mar 22, 2026

Summary

Problem

Solution

Usage

Files

Design

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant