weijt606 · weijt606 · May 24, 2026 · May 24, 2026 · May 24, 2026 · May 24, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,7 +4,49 @@ All notable changes to this project will be documented in this file.
 
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
 
-## [0.2.1] - 2026-04-09
+## [0.2.2] - 2026-05-24
+
+### Added
+- **Pareto-frontier parent selection** (`parent_selection: pareto`) — samples
+  parents from the set of per-task winners instead of always branching from the
+  single overall-best candidate, keeping specialists alive as stepping stones to
+  avoid premature convergence. Inspired by GEPA (arXiv:2507.19457). Reuses the
+  per-task scores already stored in the search log — no new data collected.
+- **Code novelty rejection** (`novelty_filter`, `novelty_threshold`,
+  `novelty_max_retries`) — detects near-duplicate candidates via stdlib
+  `difflib` text similarity (no new dependencies) and skips their evaluation to
+  save API/compute budget. Inspired by ShinkaEvolve (arXiv:2509.19349). Off by
+  default.
+- **Adaptive backend ensemble** (`proposer.ensemble`, `proposer.bandit_c`,
+  `ph run --ensemble b1,b2,...`) — when several backends are listed, a UCB1
+  bandit picks one per iteration and shifts picks toward backends that produce
+  *improving* candidates. Fully deterministic (no RNG) and adds no new
+  dependencies. Run summary shows a per-backend picks/improve-rate table.
+  Inspired by ShinkaEvolve's adaptive LLM-ensemble selection.
+- **Cascade evaluation** (`evaluator.cascade`, `cascade_threshold`,
+  `cascade_stage1`) — scores a cheap first subset of tasks and only runs the
+  rest if it clears the gate, saving budget on weak candidates (AlphaEvolve/
+  OpenEvolve-style). Per-task mode only; the base harness is always scored in
+  full. Off by default.
+- **Reproducible runs** (`search.seed`) — seeds the RNG so tournament/pareto/
+  novelty regeneration are repeatable across runs.
+- **Observability** — `ph log` marks Pareto-frontier members (◆); `ph leaderboard`
+  adds a Pareto column and a Backend column (shown only when an ensemble was
+  used). `SearchLog.pareto_win_counts()` powers both the CLI and the orchestrator.
+- `proposer_backend` recorded in each candidate's `metadata.json` (ensemble mode)
+- Hermes Agent adapter (`hermes`) — 8th proposer backend (`hermes chat -q`)
+- `--strategy pareto` and `--ensemble` options for `ph run`
+- `proposer/bandit.py` — UCB1 `BackendBandit`
+- 31 new tests (206 total)
+
+### Changed
+- Agent backends: 7 → 8 (added Hermes Agent)
+
+### Removed
+- Stray byte-identical duplicate files (`collector 2.py`, `test_collector 2.py`,
+  `test_evolution 2.py`) that inflated the test count and tripped ruff N999
+
+
 
 ### Added
 - `ph shell-hook install/uninstall/status` — zero-config auto-wrap for agent commands via shell preexec hook

diff --git a/README.md b/README.md
@@ -15,7 +15,7 @@
 
 [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
 [![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
-[![Tests](https://img.shields.io/badge/tests-212%20passing-brightgreen.svg)]()
+[![Tests](https://img.shields.io/badge/tests-206%20passing-brightgreen.svg)]()
 [![中文文档](https://img.shields.io/badge/文档-中文版-red.svg)](README_CN.md)
 
 ---
@@ -53,6 +53,12 @@ PolyHarness fills that gap. It's the open-source engine that makes Meta-Harness
 > - Memory tools (like Supermemory) give agents persistent **memory** across conversations.
 > - **PolyHarness gives agents persistent self-evolution** — you get a repeatable way to refine how they work over time.
 
+### Part of a wave — specialized for harnesses
+
+PolyHarness doesn't stand alone. A wave of open-source projects has shown that pairing LLMs with evolutionary search systematically improves code and prompts: [GEPA](https://github.com/gepa-ai/gepa) (reflective prompt evolution over a Pareto frontier), [ShinkaEvolve](https://github.com/SakanaAI/ShinkaEvolve) (sample-efficient program evolution), [OpenEvolve](https://github.com/algorithmicsuperintelligence/openevolve) (an open AlphaEvolve), and the [Darwin Gödel Machine](https://sakana.ai/dgm/) (open-ended self-improving agents).
+
+Most of these evolve *general* programs or algorithms. PolyHarness is the member of this wave **specialized for agent harnesses** — the prompts, tool config, and orchestration *around* an existing agent — with a focus on **online evolution from real usage** (`ph wrap` → `ph evolve`). It borrows the strongest ideas from these projects and applies them to any CLI agent on your own tasks: Pareto-frontier parent selection (GEPA), code-novelty rejection and an adaptive backend ensemble (ShinkaEvolve), and cascade evaluation (AlphaEvolve/OpenEvolve).
+
 ## What PolyHarness Is
 
 PolyHarness is the open-source engine for iteratively searching over an agent's harness.
@@ -469,6 +475,16 @@ The Proposer reads **all of this** before generating the next candidate. It can
 
 When you run `ph init --agent claude-code`, PolyHarness automatically generates a `CLAUDE.md` instruction file in the workspace, telling the agent how to behave as an optimization Proposer. Same for `CLAW.md`, `CODEX.md`, `AGENTS.md` (Hermes), `OPENCODE.md` — each agent's native instruction format.
 
+#### Backend ensemble (adaptive selection)
+
+Don't know which backend writes the best harness changes for your task? Let PolyHarness find out. Pass several and it picks one per iteration with a **UCB bandit**, shifting picks toward whichever backend actually produces *improving* candidates:
+
+```bash
+ph run --ensemble "claude-code,codex,local"
+```
+
+At the end of the run you get a per-backend breakdown (picks + improve-rate). Selection is deterministic given the reward sequence, so runs stay reproducible. Inspired by ShinkaEvolve's adaptive LLM-ensemble selection.
+
 ### Local Model Setup
 
 If you're running a local model (Ollama, vLLM, LM Studio, or any OpenAI-compatible server), use the `openai` backend:
@@ -517,10 +533,16 @@ After `ph init`, the workspace has a `config.yaml` with these sections:
 search:
   max_iterations: 20          # Maximum search iterations
   early_stop_patience: 5      # Stop after N iterations with no improvement
-  parent_selection: best       # Strategy: best | tournament | all
+  parent_selection: best       # Strategy: best | tournament | all | pareto
+  novelty_filter: false        # Reject near-duplicate candidates before eval (saves budget)
+  novelty_threshold: 0.97      # Similarity ratio above which a candidate is a near-duplicate
+  novelty_max_retries: 1       # Regenerate a near-duplicate this many times before skipping
+  seed: null                   # RNG seed — set an int to make randomized runs reproducible
 
 proposer:
   backend: api                 # api | openai | claude-code | claw-code | codex | hermes | opencode | local
+  ensemble: []                 # If non-empty, pick among these backends per iteration via a UCB bandit
+  bandit_c: 1.41421356         # UCB exploration constant (higher = more exploration)
   model: claude-sonnet-4-20250514  # Model name (for api/openai backends)
   base_url: null               # Custom API endpoint (for openai backend)
   api_key: null                # API key override (null = use env var)
@@ -532,6 +554,9 @@ evaluator:
   type: python                 # python | docker | custom
   entry: evaluate.py           # Evaluator script entrypoint
   timeout: 300                 # Per-task timeout in seconds
+  cascade: false               # Stage cheap subset first; skip rest if it fails the gate (per-task mode)
+  cascade_threshold: 0.4       # Min stage-1 mean score required to run the full task set
+  cascade_stage1: 0            # Tasks in stage 1 (0 = auto, ~1/3 of the list)
 
 harness:
   language: python             # Harness code language
@@ -599,11 +624,11 @@ python -m polyharness --version
 | `ph init` | Initialize workspace with auto-copy of harness, tasks, eval script |
 | `ph run` | Start the optimization search loop |
 | `ph status` | Progress table with elapsed time, improvement rate, and delta |
-| `ph log` | Search tree with delta (Δ) column (or `--flat` for table) |
+| `ph log` | Search tree with delta (Δ) column and Pareto-frontier (◆) markers (or `--flat` for table) |
 | `ph best` | Show best candidate: score, per-task breakdown, changes summary |
 | `ph compare A B` | Compare two iterations: score deltas + unified code diff |
 | `ph diff <N>` | Shorthand for `compare 0 <N>` |
-| `ph leaderboard` | Ranked table of all candidates (`--top N`, `--tasks` drilldown) |
+| `ph leaderboard` | Ranked table of all candidates with Pareto (◆) and backend columns (`--top N`, `--tasks` drilldown) |
 | `ph trace <N>` | View stdout, stderr, metrics, exit code for an iteration |
 | `ph report` | Generate a full markdown report with score trends and per-task table |
 | `ph apply` | Copy best harness back to `base_harness/` (or `--target` dir) |
@@ -647,7 +672,8 @@ python -m polyharness --version
 --dry-run            Only evaluate the base harness, skip search
 --resume             Continue an interrupted search from where it left off
 --backend <name>     Override proposer backend without editing config
---strategy <name>    Override parent selection: best | tournament | all
+--strategy <name>    Override parent selection: best | tournament | all | pareto
+--ensemble b1,b2,... Pick among multiple backends per iteration via a UCB bandit
 ```
 
 ### `ph wrap` options

diff --git a/README_CN.md b/README_CN.md
@@ -15,7 +15,7 @@
 
 [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
 [![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
-[![Tests](https://img.shields.io/badge/tests-212%20passing-brightgreen.svg)]()
+[![Tests](https://img.shields.io/badge/tests-206%20passing-brightgreen.svg)]()
 [![English](https://img.shields.io/badge/Docs-English-blue.svg)](README.md)
 
 ---
@@ -53,6 +53,12 @@ PolyHarness 填补了这个空白。它把 Meta-Harness 搜索变成了一个任
 > - 记忆工具（如 Supermemory）赋予 agent 跨会话的持久**记忆**。
 > - **PolyHarness 赋予 agent 持久的自我进化能力**，你可以用可重复运行的方式持续调整它们的工作方式。
 
+### 这波浪潮中的一员——专精 harness
+
+PolyHarness 并非孤例。一批开源项目已经证明：把 LLM 与进化搜索结合，能系统性地改进代码与 prompt——[GEPA](https://github.com/gepa-ai/gepa)（在 Pareto 前沿上做反思式 prompt 进化）、[ShinkaEvolve](https://github.com/SakanaAI/ShinkaEvolve)（样本高效的程序进化）、[OpenEvolve](https://github.com/algorithmicsuperintelligence/openevolve)（AlphaEvolve 的开源实现），以及 [Darwin Gödel Machine](https://sakana.ai/dgm/)（开放式自我改进 agent）。
+
+它们大多进化的是*通用*程序或算法。PolyHarness 是这波浪潮里**专精 agent harness** 的那一员——优化的是包裹在现有 agent *外层*的 prompt、工具配置与编排，并聚焦于**从真实使用中在线进化**（`ph wrap` → `ph evolve`）。它把这些项目中最有效的思路借鉴过来，应用到你自己任务上的任意 CLI agent：Pareto 前沿父代选择（GEPA）、代码新颖性拒绝与自适应后端集成（ShinkaEvolve）、级联评估（AlphaEvolve/OpenEvolve）。
+
 ## PolyHarness 是什么
 
 PolyHarness 是一个通过迭代评估与搜索来探索 agent harness 变体的开源引擎。
@@ -469,6 +475,16 @@ Proposer 在生成下一个候选之前会读取**所有这些信息**。它能
 
 当你运行 `ph init --agent claude-code` 时，PolyHarness 会在 workspace 中自动生成 `CLAUDE.md` 指令文件，告诉 agent 如何作为优化 Proposer 工作。`CLAW.md`、`CODEX.md`、`AGENTS.md`（Hermes）、`OPENCODE.md` 也是同样的机制，每个 agent 都使用它自己的原生指令格式。
 
+#### 后端集成（自适应择优）
+
+不确定哪个后端最擅长你的任务？让 PolyHarness 替你试。一次传入多个后端，它会用 **UCB bandit** 每轮挑一个，并逐渐把选择倾向"真正产出改进候选"的后端：
+
+```bash
+ph run --ensemble "claude-code,codex,local"
+```
+
+运行结束会给出每个后端的明细（选中次数 + 改进率）。在给定奖励序列下选择是确定性的，因此运行可复现。该机制借鉴自 ShinkaEvolve 的自适应 LLM 集成选择。
+
 ### 本地模型配置
 
 如果你在本地运行模型（Ollama、vLLM、LM Studio 或任何 OpenAI 兼容服务），使用 `openai` 后端：
@@ -517,10 +533,16 @@ proposer:
 search:
   max_iterations: 20          # 最大搜索迭代次数
   early_stop_patience: 5      # 连续 N 轮无改进后停止
-  parent_selection: best       # 父候选选择策略: best | tournament | all
+  parent_selection: best       # 父候选选择策略: best | tournament | all | pareto
+  novelty_filter: false        # 评估前拒绝近重复候选，节省预算
+  novelty_threshold: 0.97      # 超过此相似度判定为近重复
+  novelty_max_retries: 1       # 跳过前重新生成近重复候选的次数
+  seed: null                   # 随机种子 — 设为整数可让带随机性的搜索可复现
 
 proposer:
   backend: api                 # api | openai | claude-code | claw-code | codex | hermes | opencode | local
+  ensemble: []                 # 非空时，每轮用 UCB bandit 在这些后端中择优
+  bandit_c: 1.41421356         # UCB 探索常数（越大越偏探索）
   model: claude-sonnet-4-20250514  # 模型名称（api/openai 后端使用）
   base_url: null               # 自定义 API 端点（openai 后端使用）
   api_key: null                # API 密钥覆盖（null = 使用环境变量）
@@ -532,6 +554,9 @@ evaluator:
   type: python                 # python | docker | custom
   entry: evaluate.py           # 评估脚本入口
   timeout: 300                 # 每个任务的超时时间（秒）
+  cascade: false               # 先评便宜的任务子集，未过门槛则跳过其余（逐任务模式）
+  cascade_threshold: 0.4       # 进入完整任务集所需的第一阶段最低均分
+  cascade_stage1: 0            # 第一阶段任务数（0 = 自动，约占 1/3）
 
 harness:
   language: python             # Harness 代码语言
@@ -599,11 +624,11 @@ python -m polyharness --version
 | `ph init` | 初始化 workspace，自动复制 harness、任务、评估脚本 |
 | `ph run` | 启动优化搜索循环 |
 | `ph status` | 进度表格，包含耗时、改进率和增量 |
-| `ph log` | 搜索树带增量（Δ）列，或用 `--flat` 查看表格视图 |
+| `ph log` | 搜索树带增量（Δ）列和 Pareto 前沿（◆）标记，或用 `--flat` 查看表格视图 |
 | `ph best` | 展示最佳候选：分数、逐任务明细、变更摘要 |
 | `ph compare A B` | 对比两个迭代：分数差异 + 统一代码 diff |
 | `ph diff <N>` | `compare 0 <N>` 的快捷方式 |
-| `ph leaderboard` | 候选排名表（`--top N`、`--tasks` 展开每题分数） |
+| `ph leaderboard` | 候选排名表，含 Pareto（◆）与后端列（`--top N`、`--tasks` 展开每题分数） |
 | `ph trace <N>` | 查看某次迭代的 stdout、stderr、metrics、退出码 |
 | `ph report` | 生成完整 markdown 报告，包含分数趋势和逐任务表格 |
 | `ph apply` | 将最优 harness 回写到 `base_harness/`，或通过 `--target` 指定目录 |
@@ -647,7 +672,8 @@ python -m polyharness --version
 --dry-run            仅评估基线 harness，跳过搜索
 --resume             从上次中断处继续搜索
 --backend <name>     覆盖 proposer 后端，无需修改配置
---strategy <name>    覆盖父候选选择策略: best | tournament | all
+--strategy <name>    覆盖父候选选择策略: best | tournament | all | pareto
+--ensemble b1,b2,... 每轮用 UCB bandit 在多个后端中择优
 ```
 
 ### `ph wrap` 选项

diff --git a/package.json b/package.json
@@ -1,6 +1,6 @@
 {
   "name": "polyharness",
-  "version": "0.2.1",
+  "version": "0.2.2",
   "description": "Make your AI agent evolve automatically through iterative harness optimization.",
   "keywords": ["agent", "harness", "optimization", "meta-harness", "cli"],
   "license": "MIT",

diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "polyharness"
-version = "0.2.1"
+version = "0.2.2"
 description = "Automated harness optimization for AI agents — make your agent evolve."
 readme = "README.md"
 license = "MIT"

diff --git a/src/polyharness/__init__.py b/src/polyharness/__init__.py
@@ -1,3 +1,3 @@
 """PolyHarness — Automated harness optimization for AI agents."""
 
-__version__ = "0.2.1"
+__version__ = "0.2.2"