Conversation
…v0.2.2) Borrowed three techniques from recent evolutionary-agent projects, all opt-in, dependency-free, and reproducible: - Pareto-frontier parent selection (parent_selection: pareto) — samples per-task winners instead of overall-best, keeping specialists as stepping stones (GEPA, arXiv:2507.19457). Reuses per-task scores already logged. - Code novelty rejection (novelty_filter/threshold/max_retries) — skips near-duplicate candidates before evaluation via stdlib difflib to save budget (ShinkaEvolve, arXiv:2509.19349). Off by default. - Adaptive backend ensemble (proposer.ensemble, ph run --ensemble) — UCB1 bandit picks a backend per iteration, rewarding improvement-over-parent. Deterministic; prints a per-backend picks/improve-rate summary. Also: search.seed for reproducible randomized runs; proposer_backend recorded in candidate metadata; removed 3 byte-identical duplicate files that tripped ruff N999. Docs (README/README_CN/CHANGELOG) and version synced. 194 tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Borrows AlphaEvolve/OpenEvolve-style staged evaluation: score a cheap first subset of tasks and only run the rest if it clears `cascade_threshold`, saving evaluation budget on weak candidates. Complements the novelty filter and backend ensemble in cutting cost. - evaluator.cascade / cascade_threshold / cascade_stage1 config (off by default) - Orchestrator._evaluate_with_cascade: stage-1 tasks are never re-run, so the result is deterministic; the base harness is always scored in full. - Per-task mode only (non-empty `tasks` list); a no-op otherwise. - 5 new tests; docs (README/README_CN/CHANGELOG) synced. 199 tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rboard Make the new search features visible: - ph log marks Pareto-frontier members (best on >=1 task) with ◆ in both tree and flat views, with a legend. - ph leaderboard gains a Pareto column and a Backend column (the latter shown only when an ensemble recorded a proposer_backend, so single-backend output is unchanged). - Extracted SearchLog.pareto_win_counts() as the single source of truth for frontier membership, reused by Orchestrator._pareto_select (de-duplicated). - Workspace.candidate_metadata() reads a candidate's metadata.json safely. 7 new tests; docs synced. 206 tests, lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a Backstory subsection citing GEPA, ShinkaEvolve, OpenEvolve, and the Darwin Gödel Machine, framing PolyHarness as the member of this wave specialized for agent harnesses + online evolution (ph wrap → ph evolve), and noting which technique each borrows. README + README_CN. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Brings four search-strategy techniques from recent open-source evolutionary-agent
projects into PolyHarness, plus observability to make them visible. Every feature is
opt-in, dependency-free, deterministic/reproducible, and keeps existing behavior
unchanged — in line with the project's principles (open-source, no security risk,
reasonable & reproducible design).
What's new
1. Pareto-frontier parent selection —
parent_selection: paretoSamples parents from the set of per-task winners instead of always branching from
the single overall-best candidate, keeping specialists alive as stepping stones to
avoid premature convergence. Reuses the per-task scores already in the search log —
no new data collected. (GEPA, arXiv:2507.19457)
2. Code novelty rejection —
novelty_filter/novelty_threshold/novelty_max_retriesDetects near-duplicate candidates via stdlib
diffliband skips their evaluation tosave API/compute budget. Off by default. (ShinkaEvolve, arXiv:2509.19349)
3. Adaptive backend ensemble —
proposer.ensemble/ph run --ensemble a,b,cA UCB1 bandit picks a backend per iteration and shifts picks toward backends that
produce improving candidates. Fully deterministic (no RNG); prints a per-backend
picks/improve-rate summary. Leverages the existing 8-backend support.
(ShinkaEvolve adaptive LLM-ensemble selection)
4. Cascade evaluation —
evaluator.cascade/cascade_threshold/cascade_stage1Scores a cheap first subset of tasks and only runs the rest if it clears the gate,
saving budget on weak candidates. Per-task mode only; the base harness is always
scored in full. Off by default. (AlphaEvolve/OpenEvolve cascade)
Plus
search.seedmakes tournament/pareto/novelty repeatable.ph logmarks Pareto-frontier members (◆);ph leaderboardadds Pareto + Backend columns (Backend only when an ensemble was used).
SearchLog.pareto_win_counts()is the shared source of truth.proposer_backendrecorded in each candidate'smetadata.json.Design notes
proposer disables the bandit. No public API breakage.
backends; the sandbox/proposer boundaries are unchanged.
Testing
ruff check src/ tests/— cleanpytest tests/— 206 passed (173 → 206; +33)math-word-problemstemplate with the offlinelocalbackend for pareto + novelty + ensemble + cascade.