perf(evolution): atomic-worker architecture for parallel proposals (full GEPA parity)#34
Merged
Merged
Conversation
…EPA execute_proposal parity) Replace three-stage pipeline (parent-eval pool → LLM pool → sequential child-eval) with ONE ThreadPoolExecutor whose workers each execute the full GEPA execute_proposal shape atomically: parent_eval (reflective_mutation.py:268) → skip-perfect (reflective_mutation.py:308) → LLM mutate (reflective_mutation.py:369) → tamper check → child_eval (reflective_mutation.py:420).Source: /Users/ke/helix-gepa-parity-investigation.md §7 D1.Budget charges and frontier updates remain sequential (apply_proposal_output parity).Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… atomic worker Five new tests verifying GEPA execute_proposal parity: 1. Parent and child evals on same worker thread (atomicity) 2. Skip-perfect inside worker prevents LLM call 3. Worker LLM exception isolation (one failure doesn't crash pool) 4. n=3 proposals run in parallel (distinct worker threads) 5. Budget charges sequential in acceptance loop Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ch paths Remove the erroneous `_sub_ids is not None` guard added in the Architecture D worker's skip-perfect check (step W3). The guard was based on a misread of the GEPA spec — helix fires skip-perfect on both the minibatch path (where parent_eval comes from _cached_evaluate_batch) and the no-minibatch path (where it comes from _cached_eval on the full train split). Tests: test_perfect_score_skips_mutation_continues_loop ✓ test_perfect_score_does_not_terminate_run ✓ Also add the no-minibatch budget charge in the "skipped" acceptance branch (worker ran _cached_eval; charge must be applied sequentially in the loop). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nion
Replace the single `_ProposalResult` dataclass (all fields typed `object`,
mypy-opaque) with a proper four-class sealed hierarchy (Option A):
_SkippedResult – skip-perfect fired; parent_eval_result: EvalResult
_LLMFailedResult – mutate() raised/returned None; parent_eval_result: EvalResult | None
_TamperedResult – child touched protected files; child: Candidate, tampered_paths: list[str]
_SuccessResult – all steps completed; child: Candidate, child_eval_result: EvalResult | None
Fixes 87 mypy --strict errors concentrated at:
• Line 2284 (bare `tuple` annotation) → _ProposalCtx type alias
• Lines 2394–2397 (Exception not narrowed to HelixError) → direct isinstance guard
• Lines 2540–2918 (object has no attribute id/instance_scores, etc.) → isinstance
checks in acceptance loop replace `wr.kind == "..."` string discriminators,
giving mypy the narrowing it needs on all downstream field accesses
Cleanup:
• Remove stray `tmp/e2e-opencode` submodule reference (160000-mode tree entry
without a .gitmodules entry) and 13 other tracked tmp/ scratch files
• Add `tmp/` to .gitignore to prevent recurrence
Tests:
• Add TestArchitectureDAtomicWorker::test_worker_tampered_result_rejects_child_without_crash
to cover the previously-untested _TamperedResult path
mypy result: 87 errors → 0 (Success: no issues found in 29 source files)
pytest result: 865 → 866 passed (1 new tamper test, no behavior change)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pure rename/wording cleanup — zero behavior change. Replaces the internal session-only label "Architecture D" with descriptive public-facing language throughout comments, docstrings, test class name, and assertion messages. Verified: mypy --strict 0 errors, pytest 866/866 passed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces helix's three-stage parallel pipeline (Step 1b parent-eval pool → Step 2 LLM pool → Step 3 sequential child-eval) with a single
ThreadPoolExecutorwhose workers each execute the full GEPAexecute_proposalshape atomically — achieving full structural parity with GEPAreflective_mutation.py:268,308,369,420.Design
Each atomic worker runs:
_cached_evaluate_batch(parent, subsample_ids, None, ...)(bypasses cache, mirrors GEPA RM:268)SkippedResult(mirrors GEPA RM:308-327)mutate(parent, eval_for_mutate, new_id, ...)(mirrors GEPA RM:369)_detect_evaluator_tamper(child, manifest, config, project_root)(thread-safe read-only)_cached_evaluate_batch(child, subsample_ids, minibatch_cache, ...)(mirrors GEPA RM:420)Budget charges and frontier updates remain sequential in the acceptance loop (
apply_proposal_outputparity, GEPA RM:472).Source:
/Users/ke/helix-gepa-parity-investigation.md§7 D1.Result type
Diff stat
src/helix/evolution.pytests/unit/test_evolution_minibatch.pyTest results
TestAtomicProposalWorker)E2E validation
Both runs used
num_parallel_proposals=2. The OpenCode run demonstrates the parallelism directly: both worker threads printed "⟳ Running train evaluation…" simultaneously, and both mutations were accepted (both fixed theadd_oneoff-by-one bug independently).Design invariants preserved
PromptArtifactCollisionErrorre-raised from worker (fatal, not swallowed)max_workerscap applied to single pool (min(n, config.evolution.max_workers))--no-verify🤖 Generated with Claude Code