diff --git a/CHANGELOG.md b/CHANGELOG.md
index a2f792d..32723fb 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -4,7 +4,49 @@ All notable changes to this project will be documented in this file.
 
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
 
-## [0.2.1] - 2026-04-09
+## [0.2.2] - 2026-05-24
+
+### Added
+- **Pareto-frontier parent selection** (`parent_selection: pareto`) — samples
+  parents from the set of per-task winners instead of always branching from the
+  single overall-best candidate, keeping specialists alive as stepping stones to
+  avoid premature convergence. Inspired by GEPA (arXiv:2507.19457). Reuses the
+  per-task scores already stored in the search log — no new data collected.
+- **Code novelty rejection** (`novelty_filter`, `novelty_threshold`,
+  `novelty_max_retries`) — detects near-duplicate candidates via stdlib
+  `difflib` text similarity (no new dependencies) and skips their evaluation to
+  save API/compute budget. Inspired by ShinkaEvolve (arXiv:2509.19349). Off by
+  default.
+- **Adaptive backend ensemble** (`proposer.ensemble`, `proposer.bandit_c`,
+  `ph run --ensemble b1,b2,...`) — when several backends are listed, a UCB1
+  bandit picks one per iteration and shifts picks toward backends that produce
+  *improving* candidates. Fully deterministic (no RNG) and adds no new
+  dependencies. Run summary shows a per-backend picks/improve-rate table.
+  Inspired by ShinkaEvolve's adaptive LLM-ensemble selection.
+- **Cascade evaluation** (`evaluator.cascade`, `cascade_threshold`,
+  `cascade_stage1`) — scores a cheap first subset of tasks and only runs the
+  rest if it clears the gate, saving budget on weak candidates (AlphaEvolve/
+  OpenEvolve-style). Per-task mode only; the base harness is always scored in
+  full. Off by default.
+- **Reproducible runs** (`search.seed`) — seeds the RNG so tournament/pareto/
+  novelty regeneration are repeatable across runs.
+- **Observability** — `ph log` marks Pareto-frontier members (◆); `ph leaderboard`
+  adds a Pareto column and a Backend column (shown only when an ensemble was
+  used). `SearchLog.pareto_win_counts()` powers both the CLI and the orchestrator.
+- `proposer_backend` recorded in each candidate's `metadata.json` (ensemble mode)
+- Hermes Agent adapter (`hermes`) — 8th proposer backend (`hermes chat -q`)
+- `--strategy pareto` and `--ensemble` options for `ph run`
+- `proposer/bandit.py` — UCB1 `BackendBandit`
+- 31 new tests (206 total)
+
+### Changed
+- Agent backends: 7 → 8 (added Hermes Agent)
+
+### Removed
+- Stray byte-identical duplicate files (`collector 2.py`, `test_collector 2.py`,
+  `test_evolution 2.py`) that inflated the test count and tripped ruff N999
+
+
 
 ### Added
 - `ph shell-hook install/uninstall/status` — zero-config auto-wrap for agent commands via shell preexec hook
diff --git a/README.md b/README.md
index 6ff6dc6..1699c26 100644
--- a/README.md
+++ b/README.md
@@ -15,7 +15,7 @@
 
 [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
 [![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
-[![Tests](https://img.shields.io/badge/tests-212%20passing-brightgreen.svg)]()
+[![Tests](https://img.shields.io/badge/tests-206%20passing-brightgreen.svg)]()
 [![中文文档](https://img.shields.io/badge/文档-中文版-red.svg)](README_CN.md)
 
 ---
@@ -53,6 +53,12 @@ PolyHarness fills that gap. It's the open-source engine that makes Meta-Harness
 > - Memory tools (like Supermemory) give agents persistent **memory** across conversations.
 > - **PolyHarness gives agents persistent self-evolution** — you get a repeatable way to refine how they work over time.
 
+### Part of a wave — specialized for harnesses
+
+PolyHarness doesn't stand alone. A wave of open-source projects has shown that pairing LLMs with evolutionary search systematically improves code and prompts: [GEPA](https://github.com/gepa-ai/gepa) (reflective prompt evolution over a Pareto frontier), [ShinkaEvolve](https://github.com/SakanaAI/ShinkaEvolve) (sample-efficient program evolution), [OpenEvolve](https://github.com/algorithmicsuperintelligence/openevolve) (an open AlphaEvolve), and the [Darwin Gödel Machine](https://sakana.ai/dgm/) (open-ended self-improving agents).
+
+Most of these evolve *general* programs or algorithms. PolyHarness is the member of this wave **specialized for agent harnesses** — the prompts, tool config, and orchestration *around* an existing agent — with a focus on **online evolution from real usage** (`ph wrap` → `ph evolve`). It borrows the strongest ideas from these projects and applies them to any CLI agent on your own tasks: Pareto-frontier parent selection (GEPA), code-novelty rejection and an adaptive backend ensemble (ShinkaEvolve), and cascade evaluation (AlphaEvolve/OpenEvolve).
+
 ## What PolyHarness Is
 
 PolyHarness is the open-source engine for iteratively searching over an agent's harness.
@@ -469,6 +475,16 @@ The Proposer reads **all of this** before generating the next candidate. It can
 
 When you run `ph init --agent claude-code`, PolyHarness automatically generates a `CLAUDE.md` instruction file in the workspace, telling the agent how to behave as an optimization Proposer. Same for `CLAW.md`, `CODEX.md`, `AGENTS.md` (Hermes), `OPENCODE.md` — each agent's native instruction format.
 
+#### Backend ensemble (adaptive selection)
+
+Don't know which backend writes the best harness changes for your task? Let PolyHarness find out. Pass several and it picks one per iteration with a **UCB bandit**, shifting picks toward whichever backend actually produces *improving* candidates:
+
+```bash
+ph run --ensemble "claude-code,codex,local"
+```
+
+At the end of the run you get a per-backend breakdown (picks + improve-rate). Selection is deterministic given the reward sequence, so runs stay reproducible. Inspired by ShinkaEvolve's adaptive LLM-ensemble selection.
+
 ### Local Model Setup
 
 If you're running a local model (Ollama, vLLM, LM Studio, or any OpenAI-compatible server), use the `openai` backend:
@@ -517,10 +533,16 @@ After `ph init`, the workspace has a `config.yaml` with these sections:
 search:
   max_iterations: 20          # Maximum search iterations
   early_stop_patience: 5      # Stop after N iterations with no improvement
-  parent_selection: best       # Strategy: best | tournament | all
+  parent_selection: best       # Strategy: best | tournament | all | pareto
+  novelty_filter: false        # Reject near-duplicate candidates before eval (saves budget)
+  novelty_threshold: 0.97      # Similarity ratio above which a candidate is a near-duplicate
+  novelty_max_retries: 1       # Regenerate a near-duplicate this many times before skipping
+  seed: null                   # RNG seed — set an int to make randomized runs reproducible
 
 proposer:
   backend: api                 # api | openai | claude-code | claw-code | codex | hermes | opencode | local
+  ensemble: []                 # If non-empty, pick among these backends per iteration via a UCB bandit
+  bandit_c: 1.41421356         # UCB exploration constant (higher = more exploration)
   model: claude-sonnet-4-20250514  # Model name (for api/openai backends)
   base_url: null               # Custom API endpoint (for openai backend)
   api_key: null                # API key override (null = use env var)
@@ -532,6 +554,9 @@ evaluator:
   type: python                 # python | docker | custom
   entry: evaluate.py           # Evaluator script entrypoint
   timeout: 300                 # Per-task timeout in seconds
+  cascade: false               # Stage cheap subset first; skip rest if it fails the gate (per-task mode)
+  cascade_threshold: 0.4       # Min stage-1 mean score required to run the full task set
+  cascade_stage1: 0            # Tasks in stage 1 (0 = auto, ~1/3 of the list)
 
 harness:
   language: python             # Harness code language
@@ -599,11 +624,11 @@ python -m polyharness --version
 | `ph init` | Initialize workspace with auto-copy of harness, tasks, eval script |
 | `ph run` | Start the optimization search loop |
 | `ph status` | Progress table with elapsed time, improvement rate, and delta |
-| `ph log` | Search tree with delta (Δ) column (or `--flat` for table) |
+| `ph log` | Search tree with delta (Δ) column and Pareto-frontier (◆) markers (or `--flat` for table) |
 | `ph best` | Show best candidate: score, per-task breakdown, changes summary |
 | `ph compare A B` | Compare two iterations: score deltas + unified code diff |
 | `ph diff <N>` | Shorthand for `compare 0 <N>` |
-| `ph leaderboard` | Ranked table of all candidates (`--top N`, `--tasks` drilldown) |
+| `ph leaderboard` | Ranked table of all candidates with Pareto (◆) and backend columns (`--top N`, `--tasks` drilldown) |
 | `ph trace <N>` | View stdout, stderr, metrics, exit code for an iteration |
 | `ph report` | Generate a full markdown report with score trends and per-task table |
 | `ph apply` | Copy best harness back to `base_harness/` (or `--target` dir) |
@@ -647,7 +672,8 @@ python -m polyharness --version
 --dry-run            Only evaluate the base harness, skip search
 --resume             Continue an interrupted search from where it left off
 --backend <name>     Override proposer backend without editing config
---strategy <name>    Override parent selection: best | tournament | all
+--strategy <name>    Override parent selection: best | tournament | all | pareto
+--ensemble b1,b2,... Pick among multiple backends per iteration via a UCB bandit
 ```
 
 ### `ph wrap` options
diff --git a/README_CN.md b/README_CN.md
index 01b7bee..f9a8581 100644
--- a/README_CN.md
+++ b/README_CN.md
@@ -15,7 +15,7 @@
 
 [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
 [![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
-[![Tests](https://img.shields.io/badge/tests-212%20passing-brightgreen.svg)]()
+[![Tests](https://img.shields.io/badge/tests-206%20passing-brightgreen.svg)]()
 [![English](https://img.shields.io/badge/Docs-English-blue.svg)](README.md)
 
 ---
@@ -53,6 +53,12 @@ PolyHarness 填补了这个空白。它把 Meta-Harness 搜索变成了一个任
 > - 记忆工具（如 Supermemory）赋予 agent 跨会话的持久**记忆**。
 > - **PolyHarness 赋予 agent 持久的自我进化能力**，你可以用可重复运行的方式持续调整它们的工作方式。
 
+### 这波浪潮中的一员——专精 harness
+
+PolyHarness 并非孤例。一批开源项目已经证明：把 LLM 与进化搜索结合，能系统性地改进代码与 prompt——[GEPA](https://github.com/gepa-ai/gepa)（在 Pareto 前沿上做反思式 prompt 进化）、[ShinkaEvolve](https://github.com/SakanaAI/ShinkaEvolve)（样本高效的程序进化）、[OpenEvolve](https://github.com/algorithmicsuperintelligence/openevolve)（AlphaEvolve 的开源实现），以及 [Darwin Gödel Machine](https://sakana.ai/dgm/)（开放式自我改进 agent）。
+
+它们大多进化的是*通用*程序或算法。PolyHarness 是这波浪潮里**专精 agent harness** 的那一员——优化的是包裹在现有 agent *外层*的 prompt、工具配置与编排，并聚焦于**从真实使用中在线进化**（`ph wrap` → `ph evolve`）。它把这些项目中最有效的思路借鉴过来，应用到你自己任务上的任意 CLI agent：Pareto 前沿父代选择（GEPA）、代码新颖性拒绝与自适应后端集成（ShinkaEvolve）、级联评估（AlphaEvolve/OpenEvolve）。
+
 ## PolyHarness 是什么
 
 PolyHarness 是一个通过迭代评估与搜索来探索 agent harness 变体的开源引擎。
@@ -469,6 +475,16 @@ Proposer 在生成下一个候选之前会读取**所有这些信息**。它能
 
 当你运行 `ph init --agent claude-code` 时，PolyHarness 会在 workspace 中自动生成 `CLAUDE.md` 指令文件，告诉 agent 如何作为优化 Proposer 工作。`CLAW.md`、`CODEX.md`、`AGENTS.md`（Hermes）、`OPENCODE.md` 也是同样的机制，每个 agent 都使用它自己的原生指令格式。
 
+#### 后端集成（自适应择优）
+
+不确定哪个后端最擅长你的任务？让 PolyHarness 替你试。一次传入多个后端，它会用 **UCB bandit** 每轮挑一个，并逐渐把选择倾向"真正产出改进候选"的后端：
+
+```bash
+ph run --ensemble "claude-code,codex,local"
+```
+
+运行结束会给出每个后端的明细（选中次数 + 改进率）。在给定奖励序列下选择是确定性的，因此运行可复现。该机制借鉴自 ShinkaEvolve 的自适应 LLM 集成选择。
+
 ### 本地模型配置
 
 如果你在本地运行模型（Ollama、vLLM、LM Studio 或任何 OpenAI 兼容服务），使用 `openai` 后端：
@@ -517,10 +533,16 @@ proposer:
 search:
   max_iterations: 20          # 最大搜索迭代次数
   early_stop_patience: 5      # 连续 N 轮无改进后停止
-  parent_selection: best       # 父候选选择策略: best | tournament | all
+  parent_selection: best       # 父候选选择策略: best | tournament | all | pareto
+  novelty_filter: false        # 评估前拒绝近重复候选，节省预算
+  novelty_threshold: 0.97      # 超过此相似度判定为近重复
+  novelty_max_retries: 1       # 跳过前重新生成近重复候选的次数
+  seed: null                   # 随机种子 — 设为整数可让带随机性的搜索可复现
 
 proposer:
   backend: api                 # api | openai | claude-code | claw-code | codex | hermes | opencode | local
+  ensemble: []                 # 非空时，每轮用 UCB bandit 在这些后端中择优
+  bandit_c: 1.41421356         # UCB 探索常数（越大越偏探索）
   model: claude-sonnet-4-20250514  # 模型名称（api/openai 后端使用）
   base_url: null               # 自定义 API 端点（openai 后端使用）
   api_key: null                # API 密钥覆盖（null = 使用环境变量）
@@ -532,6 +554,9 @@ evaluator:
   type: python                 # python | docker | custom
   entry: evaluate.py           # 评估脚本入口
   timeout: 300                 # 每个任务的超时时间（秒）
+  cascade: false               # 先评便宜的任务子集，未过门槛则跳过其余（逐任务模式）
+  cascade_threshold: 0.4       # 进入完整任务集所需的第一阶段最低均分
+  cascade_stage1: 0            # 第一阶段任务数（0 = 自动，约占 1/3）
 
 harness:
   language: python             # Harness 代码语言
@@ -599,11 +624,11 @@ python -m polyharness --version
 | `ph init` | 初始化 workspace，自动复制 harness、任务、评估脚本 |
 | `ph run` | 启动优化搜索循环 |
 | `ph status` | 进度表格，包含耗时、改进率和增量 |
-| `ph log` | 搜索树带增量（Δ）列，或用 `--flat` 查看表格视图 |
+| `ph log` | 搜索树带增量（Δ）列和 Pareto 前沿（◆）标记，或用 `--flat` 查看表格视图 |
 | `ph best` | 展示最佳候选：分数、逐任务明细、变更摘要 |
 | `ph compare A B` | 对比两个迭代：分数差异 + 统一代码 diff |
 | `ph diff <N>` | `compare 0 <N>` 的快捷方式 |
-| `ph leaderboard` | 候选排名表（`--top N`、`--tasks` 展开每题分数） |
+| `ph leaderboard` | 候选排名表，含 Pareto（◆）与后端列（`--top N`、`--tasks` 展开每题分数） |
 | `ph trace <N>` | 查看某次迭代的 stdout、stderr、metrics、退出码 |
 | `ph report` | 生成完整 markdown 报告，包含分数趋势和逐任务表格 |
 | `ph apply` | 将最优 harness 回写到 `base_harness/`，或通过 `--target` 指定目录 |
@@ -647,7 +672,8 @@ python -m polyharness --version
 --dry-run            仅评估基线 harness，跳过搜索
 --resume             从上次中断处继续搜索
 --backend <name>     覆盖 proposer 后端，无需修改配置
---strategy <name>    覆盖父候选选择策略: best | tournament | all
+--strategy <name>    覆盖父候选选择策略: best | tournament | all | pareto
+--ensemble b1,b2,... 每轮用 UCB bandit 在多个后端中择优
 ```
 
 ### `ph wrap` 选项
diff --git a/package.json b/package.json
index d652445..515e58e 100644
--- a/package.json
+++ b/package.json
@@ -1,6 +1,6 @@
 {
   "name": "polyharness",
-  "version": "0.2.1",
+  "version": "0.2.2",
   "description": "Make your AI agent evolve automatically through iterative harness optimization.",
   "keywords": ["agent", "harness", "optimization", "meta-harness", "cli"],
   "license": "MIT",
diff --git a/pyproject.toml b/pyproject.toml
index a104fa9..a93374c 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "polyharness"
-version = "0.2.1"
+version = "0.2.2"
 description = "Automated harness optimization for AI agents — make your agent evolve."
 readme = "README.md"
 license = "MIT"
diff --git a/src/polyharness/__init__.py b/src/polyharness/__init__.py
index e93c856..e932175 100644
--- a/src/polyharness/__init__.py
+++ b/src/polyharness/__init__.py
@@ -1,3 +1,3 @@
 """PolyHarness — Automated harness optimization for AI agents."""
 
-__version__ = "0.2.1"
+__version__ = "0.2.2"
diff --git a/src/polyharness/cli.py b/src/polyharness/cli.py
index c364703..1991984 100644
--- a/src/polyharness/cli.py
+++ b/src/polyharness/cli.py
@@ -247,10 +247,16 @@ def init(
 )
 @click.option(
     "--strategy",
-    type=click.Choice(["best", "tournament", "all"], case_sensitive=False),
+    type=click.Choice(["best", "tournament", "all", "pareto"], case_sensitive=False),
     default=None,
     help="Override parent selection strategy.",
 )
+@click.option(
+    "--ensemble",
+    default=None,
+    metavar="b1,b2,...",
+    help="Comma-separated backends to pick among per iteration via a UCB bandit.",
+)
 def run(
     workspace: str,
     max_iterations: int | None,
@@ -258,6 +264,7 @@ def run(
     resume: bool,
     backend: str | None,
     strategy: str | None,
+    ensemble: str | None,
 ):
     """Start the optimization search loop."""
     from polyharness.orchestrator import Orchestrator
@@ -275,6 +282,16 @@ def run(
     if backend is not None:
         config.proposer.backend = backend  # type: ignore[assignment]
 
+    if ensemble is not None:
+        names = [b.strip() for b in ensemble.split(",") if b.strip()]
+        try:
+            # Validate against the config model (rejects unknown backend names).
+            config.proposer.ensemble = names  # type: ignore[assignment]
+            config = config.model_validate(config.model_dump())
+        except Exception as exc:
+            console.print(f"[red]Error:[/red] Invalid --ensemble value: {exc}")
+            raise SystemExit(1)
+
     if strategy is not None:
         config.search.parent_selection = strategy  # type: ignore[assignment]
 
@@ -597,21 +614,35 @@ def log(workspace: str, flat: bool):
 
     best_i = search_log.best_iteration
     parent_scores = {e.iteration: e.score for e in search_log.entries}
+    pareto_front = set(search_log.pareto_win_counts())
 
     if flat:
-        _print_log_flat(search_log.entries, best_i, parent_scores)
+        _print_log_flat(search_log.entries, best_i, parent_scores, pareto_front)
     else:
-        _print_log_tree(search_log.entries, best_i, parent_scores)
+        _print_log_tree(search_log.entries, best_i, parent_scores, pareto_front)
 
+    legend = "[yellow]★[/yellow] best"
+    if pareto_front:
+        legend += "   [magenta]◆[/magenta] Pareto frontier (best on ≥1 task)"
     console.print(
         f"\n{len(search_log)} iterations  |  "
-        f"best: iter_{best_i} ({search_log.best_score:.4f})"
+        f"best: iter_{best_i} ({search_log.best_score:.4f})  |  {legend}"
     )
 
 
-def _log_entry_label(entry, best_i: int, parent_scores: dict[int, float] | None = None) -> str:
+def _log_entry_label(
+    entry,
+    best_i: int,
+    parent_scores: dict[int, float] | None = None,
+    pareto_front: set[int] | None = None,
+) -> str:
     """Format a single log entry as a rich-styled label."""
     star = " [bold yellow]★[/bold yellow]" if entry.iteration == best_i else ""
+    pf = (
+        " [magenta]◆[/magenta]"
+        if pareto_front and entry.iteration in pareto_front
+        else ""
+    )
     score_color = "green" if entry.score >= entry.best_so_far else "white"
     delta = ""
     if parent_scores and entry.parent is not None and entry.parent in parent_scores:
@@ -624,11 +655,16 @@ def _log_entry_label(entry, best_i: int, parent_scores: dict[int, float] | None
             delta = "  [dim]+0.0000[/dim]"
     return (
         f"[bold cyan]iter_{entry.iteration}[/bold cyan]  "
-        f"[{score_color}]{entry.score:.4f}[/{score_color}]{delta}{star}"
+        f"[{score_color}]{entry.score:.4f}[/{score_color}]{delta}{pf}{star}"
     )
 
 
-def _print_log_tree(entries, best_i: int, parent_scores: dict[int, float] | None = None) -> None:
+def _print_log_tree(
+    entries,
+    best_i: int,
+    parent_scores: dict[int, float] | None = None,
+    pareto_front: set[int] | None = None,
+) -> None:
     """Print a rich Tree showing parent→child relationships."""
     from rich.tree import Tree
 
@@ -644,26 +680,31 @@ def _print_log_tree(entries, best_i: int, parent_scores: dict[int, float] | None
     roots = children.get(None, [])
     if not roots:
         # Fallback to flat if no root found
-        _print_log_flat(entries, best_i, parent_scores)
+        _print_log_flat(entries, best_i, parent_scores, pareto_front)
         return
 
     tree = Tree("[bold]Search Tree[/bold]")
 
     def _add_children(parent_tree, iteration: int) -> None:
         for child in children.get(iteration, []):
-            label = _log_entry_label(child, best_i, parent_scores)
+            label = _log_entry_label(child, best_i, parent_scores, pareto_front)
             branch = parent_tree.add(label)
             _add_children(branch, child.iteration)
 
     for root_entry in roots:
-        label = _log_entry_label(root_entry, best_i, parent_scores)
+        label = _log_entry_label(root_entry, best_i, parent_scores, pareto_front)
         root_branch = tree.add(label)
         _add_children(root_branch, root_entry.iteration)
 
     console.print(tree)
 
 
-def _print_log_flat(entries, best_i: int, parent_scores: dict[int, float] | None = None) -> None:
+def _print_log_flat(
+    entries,
+    best_i: int,
+    parent_scores: dict[int, float] | None = None,
+    pareto_front: set[int] | None = None,
+) -> None:
     """Print a chronological table of all iterations."""
     table = Table(title="Search Log")
     table.add_column("Iteration", style="cyan")
@@ -671,6 +712,8 @@ def _print_log_flat(entries, best_i: int, parent_scores: dict[int, float] | None
     table.add_column("Score", style="green")
     table.add_column("Δ", style="bold")
     table.add_column("Best", style="bold green")
+    if pareto_front:
+        table.add_column("PF", style="magenta", justify="center")
     table.add_column("", style="yellow")
 
     for e in entries:
@@ -685,14 +728,17 @@ def _print_log_flat(entries, best_i: int, parent_scores: dict[int, float] | None
                 delta = f"[red]{d:.4f}[/red]"
             else:
                 delta = "[dim]+0.0000[/dim]"
-        table.add_row(
+        row = [
             f"iter_{e.iteration}",
             parent_str,
             f"{e.score:.4f}",
             delta,
             f"{e.best_so_far:.4f}",
-            star,
-        )
+        ]
+        if pareto_front:
+            row.append("◆" if e.iteration in pareto_front else "")
+        row.append(star)
+        table.add_row(*row)
 
     console.print(table)
 
@@ -991,6 +1037,11 @@ def leaderboard(workspace: str, top: int | None, tasks: bool):
         entries = entries[:top]
 
     base_score = next((e.score for e in log.entries if e.iteration == 0), 0.0)
+    pareto_front = set(log.pareto_win_counts())
+
+    # Backend per candidate (only meaningful when an ensemble was used).
+    backends = {e.iteration: ws.candidate_metadata(e.iteration).get("proposer_backend") for e in entries}
+    show_backend = any(backends.values())
 
     # Gather all task names
     all_task_names: list[str] = []
@@ -1006,6 +1057,10 @@ def leaderboard(workspace: str, top: int | None, tasks: bool):
     table.add_column("Score", style="green")
     table.add_column("vs Base", style="bold")
     table.add_column("Parent", style="dim")
+    if pareto_front:
+        table.add_column("PF", style="magenta", justify="center")
+    if show_backend:
+        table.add_column("Backend", style="blue")
     if tasks:
         for tn in all_task_names:
             table.add_column(tn, style="white", width=8)
@@ -1029,6 +1084,10 @@ def leaderboard(workspace: str, top: int | None, tasks: bool):
             vs_base,
             parent_str,
         ]
+        if pareto_front:
+            row.append("◆" if entry.iteration in pareto_front else "")
+        if show_backend:
+            row.append(backends.get(entry.iteration) or "—")
         if tasks:
             for tn in all_task_names:
                 val = entry.task_scores.get(tn)
@@ -1037,7 +1096,10 @@ def leaderboard(workspace: str, top: int | None, tasks: bool):
         table.add_row(*row)
 
     console.print(table)
-    console.print(f"\n{len(log)} total iterations  |  Showing top {len(entries)}")
+    footer = f"\n{len(log)} total iterations  |  Showing top {len(entries)}"
+    if pareto_front:
+        footer += "  |  [magenta]◆[/magenta] Pareto frontier"
+    console.print(footer)
 
 
 # --- trace command ---
diff --git a/src/polyharness/config.py b/src/polyharness/config.py
index d2f47a0..92041ff 100644
--- a/src/polyharness/config.py
+++ b/src/polyharness/config.py
@@ -8,6 +8,13 @@
 import yaml
 from pydantic import BaseModel, Field
 
+# Single source of truth for proposer backend names. Used by both the fixed
+# `backend` field and the optional `ensemble` list (which gets validation for
+# free by reusing this Literal alias).
+BackendName = Literal[
+    "api", "openai", "claude-code", "claw-code", "codex", "hermes", "opencode", "local"
+]
+
 
 class SearchConfig(BaseModel):
     """Search loop parameters."""
@@ -16,17 +23,66 @@ class SearchConfig(BaseModel):
     early_stop_patience: int = Field(
         default=5, ge=1, description="Stop after N iterations without improvement."
     )
-    parent_selection: Literal["best", "tournament", "all"] = Field(
-        default="best", description="Parent candidate selection strategy."
+    seed: int | None = Field(
+        default=None,
+        description=(
+            "Optional RNG seed. When set, randomized strategies (tournament, "
+            "pareto, novelty regeneration) become reproducible across runs."
+        ),
+    )
+    parent_selection: Literal["best", "tournament", "all", "pareto"] = Field(
+        default="best",
+        description=(
+            "Parent candidate selection strategy. "
+            "'pareto' samples from the per-task winners (GEPA-style frontier) "
+            "to avoid premature convergence to a single overall-best candidate."
+        ),
+    )
+    novelty_filter: bool = Field(
+        default=False,
+        description=(
+            "Reject near-duplicate candidates before evaluation to save budget "
+            "(ShinkaEvolve-style novelty rejection). Off by default."
+        ),
+    )
+    novelty_threshold: float = Field(
+        default=0.97,
+        ge=0.0,
+        le=1.0,
+        description=(
+            "Text-similarity ratio (0–1) above which a candidate is treated as a "
+            "near-duplicate of an earlier one. Higher = stricter (fewer rejections)."
+        ),
+    )
+    novelty_max_retries: int = Field(
+        default=1,
+        ge=0,
+        description=(
+            "How many times to regenerate a near-duplicate candidate before "
+            "skipping its evaluation entirely."
+        ),
     )
 
 
 class ProposerConfig(BaseModel):
     """Proposer agent configuration."""
 
-    backend: Literal["api", "openai", "claude-code", "claw-code", "codex", "hermes", "opencode", "local"] = Field(
+    backend: BackendName = Field(
         default="api", description="Proposer backend type."
     )
+    ensemble: list[BackendName] = Field(
+        default_factory=list,
+        description=(
+            "Optional list of backends. When non-empty, the orchestrator picks a "
+            "backend per iteration via a UCB bandit that favors backends producing "
+            "improving candidates. Empty (default) = always use `backend`."
+        ),
+    )
+    bandit_c: float = Field(
+        default=1.41421356,
+        ge=0.0,
+        description="UCB exploration constant for ensemble selection. Higher = more exploration.",
+    )
     model: str = Field(
         default="claude-sonnet-4-20250514", description="Model for the Proposer agent."
     )
@@ -52,6 +108,29 @@ class EvaluatorConfig(BaseModel):
     entry: str = Field(default="evaluate.py", description="Evaluator script entrypoint.")
     timeout: int = Field(default=300, ge=1, description="Per-task timeout in seconds.")
     tasks: list[str] = Field(default_factory=list, description="Task file paths.")
+    cascade: bool = Field(
+        default=False,
+        description=(
+            "Staged evaluation: score a cheap first subset of tasks, and only run "
+            "the rest if that subset clears `cascade_threshold` (AlphaEvolve/"
+            "OpenEvolve-style cascade). Saves budget on weak candidates. Requires "
+            "per-task mode (a non-empty `tasks` list); ignored otherwise."
+        ),
+    )
+    cascade_threshold: float = Field(
+        default=0.4,
+        ge=0.0,
+        le=1.0,
+        description="Minimum stage-1 mean score required to proceed to the full task set.",
+    )
+    cascade_stage1: int = Field(
+        default=0,
+        ge=0,
+        description=(
+            "Number of tasks in the cheap first stage. 0 = auto (about one third "
+            "of the task list, leaving at least one task for stage 2)."
+        ),
+    )
 
 
 class HarnessConfig(BaseModel):
diff --git a/src/polyharness/orchestrator.py b/src/polyharness/orchestrator.py
index 83a833a..9aa0e6e 100644
--- a/src/polyharness/orchestrator.py
+++ b/src/polyharness/orchestrator.py
@@ -3,14 +3,16 @@
 from __future__ import annotations
 
 import random
+import shutil
 from dataclasses import dataclass
 
 from rich.console import Console
 from rich.table import Table
 
 from polyharness.config import PolyHarnessConfig
-from polyharness.evaluator import BaseEvaluator, create_evaluator
+from polyharness.evaluator import BaseEvaluator, EvalResult, create_evaluator
 from polyharness.proposer import BaseProposer, create_proposer
+from polyharness.proposer.bandit import BackendBandit
 from polyharness.search_log import SearchLog
 from polyharness.workspace import Workspace
 
@@ -38,21 +40,51 @@ def __init__(
         config: PolyHarnessConfig,
         proposer: BaseProposer | None = None,
         evaluator: BaseEvaluator | None = None,
+        proposers: dict[str, BaseProposer] | None = None,
     ):
         self.workspace = workspace
         self.config = config
-        self.proposer = proposer or create_proposer(config.proposer)
         self.evaluator = evaluator or create_evaluator(config.evaluator, cwd=workspace.root)
         self.search_log = SearchLog(workspace.search_log_path)
 
+        # Cache of backend-name → proposer. Pre-seeded ones (e.g. from tests)
+        # are used as-is; others are created lazily on first use.
+        self._proposer_cache: dict[str, BaseProposer] = dict(proposers or {})
+
+        # Ensemble (bandit) mode is opt-in and only active when the caller did
+        # not inject a single fixed proposer. An explicit `proposer=` always
+        # wins, keeping existing behavior and tests unchanged.
+        ensemble = config.proposer.ensemble
+        if proposer is None and ensemble:
+            self.bandit: BackendBandit | None = BackendBandit(
+                list(ensemble), c=config.proposer.bandit_c
+            )
+            self.proposer: BaseProposer | None = None  # chosen per iteration
+        else:
+            self.bandit = None
+            self.proposer = (
+                proposer
+                or self._proposer_cache.get(config.proposer.backend)
+                or create_proposer(config.proposer)
+            )
+
     def run(self, resume: bool = False) -> SearchResult:
         """Execute the full search loop."""
         max_iter = self.config.search.max_iterations
 
+        # Reproducibility: seed RNG so tournament/pareto/novelty are repeatable.
+        if self.config.search.seed is not None:
+            random.seed(self.config.search.seed)
+
         console.rule("[bold blue]PolyHarness Optimization Loop")
         console.print(f"Max iterations: {max_iter}")
         console.print(f"Early stop patience: {self.config.search.early_stop_patience}")
-        console.print(f"Proposer backend: {self.config.proposer.backend}")
+        if self.bandit is not None:
+            console.print(
+                f"Proposer ensemble: {', '.join(self.bandit.backends)} (UCB bandit)"
+            )
+        else:
+            console.print(f"Proposer backend: {self.config.proposer.backend}")
         console.print()
 
         # Determine starting point (resume or fresh)
@@ -139,37 +171,53 @@ def run(self, resume: bool = False) -> SearchResult:
             for i in range(start_iter, max_iter + 1):
                 progress.update(task, description=f"iter_{i}")
 
+                backend: str | None = None
                 try:
                     # Step 1: Select parent
                     parent = self._select_parent()
 
-                    # Step 2: Prepare candidate directory (copy from parent)
-                    cand_dir = self.workspace.prepare_candidate(i, parent)
+                    # Step 1.5: Select which backend proposes this iteration
+                    # (bandit) or fall back to the single fixed proposer.
+                    backend, proposer = self._select_proposer()
 
-                    # Step 3: Proposer generates new candidate
-                    metadata = self.proposer.propose(
-                        workspace_root=self.workspace.root,
-                        candidate_dir=cand_dir,
-                        iteration=i,
-                        parent=parent,
+                    # Steps 2–3: Propose a candidate, optionally rejecting
+                    # near-duplicates (novelty filter).
+                    cand_dir, metadata, accepted = self._propose_with_novelty(
+                        i, parent, proposer
                     )
 
-                    # Step 3.5: Verify proposer produced a harness file
-                    if not (cand_dir / "harness.py").exists():
-                        raise FileNotFoundError(
-                            f"Proposer did not generate harness.py in iter_{i}"
+                    # If the candidate is a near-duplicate even after retries,
+                    # skip its (potentially expensive) evaluation entirely.
+                    if not accepted:
+                        console.print(
+                            f"\n[yellow]iter_{i}: skipped — near-duplicate of an "
+                            f"earlier candidate (saved evaluation budget)[/yellow]"
                         )
+                        # Drop the dangling candidate dir so its copied-from-parent
+                        # score.json doesn't pollute the leaderboard.
+                        shutil.rmtree(cand_dir, ignore_errors=True)
+                        self._reward_backend(backend, 0.0)  # duplicate = no value
+                        patience_counter += 1
+                        progress.update(task, advance=1)
+                        if patience_counter >= self.config.search.early_stop_patience:
+                            break
+                        continue
 
                     # Step 4: Evaluate
                     score = self._evaluate_iteration(i)
                 except Exception as exc:
                     console.print(f"\n[red]iter_{i} failed: {exc}[/red]")
+                    self._reward_backend(backend, 0.0)  # failure = no value
                     patience_counter += 1
                     progress.update(task, advance=1)
                     if patience_counter >= self.config.search.early_stop_patience:
                         break
                     continue
 
+                # Record which backend produced this candidate (observability).
+                if backend is not None:
+                    metadata = {**metadata, "proposer_backend": backend}
+
                 # Step 5: Store results
                 log_entry = self.search_log.entries[-1]
                 self.workspace.store_iteration(
@@ -180,6 +228,11 @@ def run(self, resume: bool = False) -> SearchResult:
                     metadata=metadata,
                 )
 
+                # Reward the backend when its candidate improved over its parent.
+                self._reward_backend(
+                    backend, 1.0 if score > self._parent_score(parent) else 0.0
+                )
+
                 # Step 6: Update best & check early stop
                 if score > best_score:
                     best_score = score
@@ -222,10 +275,8 @@ def _evaluate_iteration(self, iteration: int, is_base: bool = False) -> float:
         else:
             cand_dir = self.workspace.candidate_path(iteration)
 
-        eval_result = self.evaluator.evaluate(
-            candidate_dir=cand_dir,
-            tasks=self.config.evaluator.tasks,
-        )
+        # Base harness is always scored in full; candidates may use cascade.
+        eval_result = self._run_eval(cand_dir, allow_cascade=not is_base)
 
         parent = None if is_base else self.search_log.best_iteration
         self.search_log.append(
@@ -246,6 +297,47 @@ def _evaluate_iteration(self, iteration: int, is_base: bool = False) -> float:
 
         return eval_result.overall_score
 
+    def _run_eval(self, cand_dir, *, allow_cascade: bool) -> EvalResult:
+        """Evaluate a candidate, applying cascade when enabled and applicable."""
+        tasks = self.config.evaluator.tasks
+        if allow_cascade and self.config.evaluator.cascade and len(tasks) >= 2:
+            return self._evaluate_with_cascade(cand_dir, tasks)
+        return self.evaluator.evaluate(candidate_dir=cand_dir, tasks=tasks)
+
+    def _evaluate_with_cascade(self, cand_dir, tasks: list[str]) -> EvalResult:
+        """Staged evaluation: cheap subset first, full set only if it clears the gate.
+
+        Splits *tasks* into a stage-1 subset and the rest. A candidate whose
+        stage-1 mean falls below ``cascade_threshold`` is rejected early without
+        running stage 2, saving evaluation budget on weak candidates
+        (AlphaEvolve/OpenEvolve-style cascade). Stage-1 tasks are never
+        re-evaluated, so the result is deterministic.
+        """
+        k = self.config.evaluator.cascade_stage1
+        if k <= 0:
+            k = max(1, (len(tasks) + 2) // 3)  # ~1/3 of tasks, rounded up
+        k = min(k, len(tasks) - 1)  # always leave at least one task for stage 2
+
+        stage1, stage2 = tasks[:k], tasks[k:]
+        r1 = self.evaluator.evaluate(candidate_dir=cand_dir, tasks=stage1)
+
+        threshold = self.config.evaluator.cascade_threshold
+        if r1.overall_score < threshold:
+            console.print(
+                f"[dim]  cascade: gated at stage 1 "
+                f"({r1.overall_score:.2f} < {threshold:.2f}) — "
+                f"skipped {len(stage2)} task(s)[/dim]"
+            )
+            return r1
+
+        r2 = self.evaluator.evaluate(candidate_dir=cand_dir, tasks=stage2)
+        task_scores = {**r1.task_scores, **r2.task_scores}
+        traces = {**r1.traces, **r2.traces}
+        overall = (
+            sum(task_scores.values()) / len(task_scores) if task_scores else 0.0
+        )
+        return EvalResult(overall_score=overall, task_scores=task_scores, traces=traces)
+
     def _select_parent(self) -> int:
         """Select parent candidate based on strategy."""
         strategy = self.config.search.parent_selection
@@ -253,6 +345,8 @@ def _select_parent(self) -> int:
             return self.search_log.best_iteration
         elif strategy == "tournament":
             return self._tournament_select()
+        elif strategy == "pareto":
+            return self._pareto_select()
         else:  # "all" — proposer decides, so we pass best as default parent
             return self.search_log.best_iteration
 
@@ -271,6 +365,150 @@ def _tournament_select(self, k: int = 3) -> int:
             contestants = random.sample(entries, k)
         return max(contestants, key=lambda e: e.score).iteration
 
+    def _pareto_select(self) -> int:
+        """GEPA-style Pareto-frontier parent selection.
+
+        Rather than always branching from the single best *overall* candidate,
+        build the set of candidates that achieve the top score on at least one
+        individual task ("per-task winners"), then sample one of them weighted
+        by how many tasks it wins.  This keeps specialists that are strong on a
+        subset of tasks alive as stepping stones, avoiding premature
+        convergence (Pareto-based selection, GEPA — arXiv:2507.19457).
+
+        Falls back to ``best`` when per-task scores are unavailable.
+        """
+        win_counts = self.search_log.pareto_win_counts()
+        if not win_counts:
+            return self.search_log.best_iteration
+
+        iterations = list(win_counts.keys())
+        weights = [win_counts[i] for i in iterations]
+        return random.choices(iterations, weights=weights, k=1)[0]
+
+    def _select_proposer(self) -> tuple[str | None, BaseProposer]:
+        """Pick the proposer for this iteration.
+
+        Returns ``(backend_name, proposer)``. In single-backend mode the name
+        is ``None`` and the fixed proposer is returned. In ensemble mode the
+        UCB bandit chooses a backend and its (lazily created) proposer.
+        """
+        if self.bandit is None:
+            assert self.proposer is not None
+            return None, self.proposer
+        backend = self.bandit.select()
+        return backend, self._get_proposer(backend)
+
+    def _get_proposer(self, backend: str) -> BaseProposer:
+        """Return (creating + caching on first use) the proposer for *backend*."""
+        if backend not in self._proposer_cache:
+            sub_config = self.config.proposer.model_copy(update={"backend": backend})
+            self._proposer_cache[backend] = create_proposer(sub_config)
+        return self._proposer_cache[backend]
+
+    def _reward_backend(self, backend: str | None, reward: float) -> None:
+        """Feed a reward to the bandit (no-op in single-backend mode)."""
+        if self.bandit is not None and backend is not None:
+            self.bandit.update(backend, reward)
+
+    def _parent_score(self, parent: int | None) -> float:
+        """Score of the parent iteration (0.0 if unknown)."""
+        if parent is None:
+            return 0.0
+        for entry in self.search_log.entries:
+            if entry.iteration == parent:
+                return entry.score
+        return 0.0
+
+    def _propose_with_novelty(self, iteration: int, parent: int, proposer: BaseProposer):
+        """Propose a candidate, optionally rejecting near-duplicates.
+
+        Returns ``(candidate_dir, metadata, accepted)``.  When the novelty
+        filter is enabled and the proposer keeps producing a candidate that is
+        too similar to an earlier one, regenerate up to ``novelty_max_retries``
+        times; if still a near-duplicate, return ``accepted=False`` so the
+        caller can skip evaluation and save budget (ShinkaEvolve-style code
+        novelty rejection — arXiv:2509.19349).
+        """
+        cand_dir, metadata = self._propose_candidate(iteration, parent, proposer)
+
+        if not self.config.search.novelty_filter:
+            return cand_dir, metadata, True
+
+        threshold = self.config.search.novelty_threshold
+        max_retries = self.config.search.novelty_max_retries
+
+        for attempt in range(max_retries + 1):
+            similarity = self._max_similarity(iteration, cand_dir)
+            if similarity < threshold:
+                return cand_dir, metadata, True
+            if attempt < max_retries:
+                console.print(
+                    f"[dim]iter_{iteration}: candidate {similarity:.2f} similar to an "
+                    f"existing one — regenerating ({attempt + 1}/{max_retries})[/dim]"
+                )
+                cand_dir, metadata = self._propose_candidate(iteration, parent, proposer)
+
+        return cand_dir, metadata, False
+
+    def _propose_candidate(self, iteration: int, parent: int, proposer: BaseProposer):
+        """Prepare a candidate dir, run the proposer, and verify output.
+
+        Returns ``(candidate_dir, metadata)``.  Raises ``FileNotFoundError``
+        when the proposer fails to produce ``harness.py``.
+        """
+        cand_dir = self.workspace.prepare_candidate(iteration, parent)
+        metadata = proposer.propose(
+            workspace_root=self.workspace.root,
+            candidate_dir=cand_dir,
+            iteration=iteration,
+            parent=parent,
+        )
+        if not (cand_dir / "harness.py").exists():
+            raise FileNotFoundError(
+                f"Proposer did not generate harness.py in iter_{iteration}"
+            )
+        return cand_dir, metadata
+
+    def _max_similarity(self, iteration: int, cand_dir) -> float:
+        """Max text similarity of *cand_dir* against all earlier candidates.
+
+        Uses :class:`difflib.SequenceMatcher` (stdlib, no extra deps) on the
+        concatenated editable harness files. Returns a ratio in ``[0, 1]``.
+        """
+        from difflib import SequenceMatcher
+
+        new_text = self._candidate_text(cand_dir)
+        if not new_text:
+            return 0.0
+
+        best = 0.0
+        for entry in self.search_log.entries:
+            if entry.iteration == iteration:
+                continue
+            other_dir = self.workspace.candidate_path(entry.iteration)
+            if not other_dir.exists():
+                continue
+            other_text = self._candidate_text(other_dir)
+            if not other_text:
+                continue
+            ratio = SequenceMatcher(None, new_text, other_text).ratio()
+            if ratio > best:
+                best = ratio
+        return best
+
+    def _candidate_text(self, cand_dir) -> str:
+        """Concatenate a candidate's editable harness files into one blob."""
+        parts: list[str] = []
+        for fname in self.config.harness.editable_files:
+            f = cand_dir / fname
+            if f.is_file():
+                parts.append(f.read_text())
+        if not parts:
+            entry = cand_dir / self.config.harness.entry
+            if entry.is_file():
+                parts.append(entry.read_text())
+        return "\n".join(parts)
+
     def _print_iteration(self, iteration: int, score: float, best_so_far: float, parent: int | None) -> None:
         parent_str = f"iter_{parent}" if parent is not None else "base"
         delta = score - best_so_far if iteration > 0 else 0
@@ -288,6 +526,19 @@ def _print_summary(self, result: SearchResult) -> None:
         table.add_row("Best score", f"{result.best_score:.4f}")
         table.add_row("Total iterations", str(result.total_iterations))
         console.print(table)
+
+        # Ensemble bandit breakdown: which backend earned its picks.
+        if self.bandit is not None and self.bandit.total_pulls > 0:
+            bandit_table = Table(title="Proposer ensemble (UCB bandit)")
+            bandit_table.add_column("Backend")
+            bandit_table.add_column("Picks", justify="right")
+            bandit_table.add_column("Improve rate", justify="right")
+            for backend, s in self.bandit.stats().items():
+                bandit_table.add_row(
+                    backend, str(s["pulls"]), f"{s['mean_reward']:.2f}"
+                )
+            console.print(bandit_table)
+
         console.print(
             "\nRun [bold]ph best[/bold] to see details, or [bold]ph apply[/bold] to apply the result."
         )
diff --git a/src/polyharness/proposer/bandit.py b/src/polyharness/proposer/bandit.py
new file mode 100644
index 0000000..e64019c
--- /dev/null
+++ b/src/polyharness/proposer/bandit.py
@@ -0,0 +1,84 @@
+"""UCB1 bandit for adaptive multi-backend proposer selection.
+
+When several proposer backends are available, we don't know up front which one
+writes the best harness changes for a given task. Instead of committing to one,
+the orchestrator can treat backend choice as a multi-armed bandit: each
+iteration it picks the backend with the highest UCB score, observes whether the
+produced candidate improved, and updates its estimate.
+
+Design notes (aligned with project principles):
+- **Deterministic.** UCB1 is fully deterministic given the reward sequence;
+  ties break by configured backend order. No RNG, so runs are reproducible.
+- **No new dependencies.** Pure stdlib (``math``).
+- **No new attack surface.** It only chooses among already-configured backends;
+  it never constructs commands or executes anything itself.
+
+Inspired by ShinkaEvolve's adaptive LLM-ensemble selection (arXiv:2509.19349).
+"""
+
+from __future__ import annotations
+
+import math
+from dataclasses import dataclass
+
+
+@dataclass
+class _Arm:
+    count: int = 0
+    total_reward: float = 0.0
+
+    @property
+    def mean(self) -> float:
+        return self.total_reward / self.count if self.count else 0.0
+
+
+class BackendBandit:
+    """UCB1 multi-armed bandit over a fixed set of backend names."""
+
+    def __init__(self, backends: list[str], c: float = 1.41421356):
+        if not backends:
+            raise ValueError("BackendBandit requires at least one backend.")
+        # Preserve order (used for deterministic tie-breaking) and dedupe.
+        self.backends: list[str] = list(dict.fromkeys(backends))
+        self.c = c
+        self._arms: dict[str, _Arm] = {b: _Arm() for b in self.backends}
+
+    @property
+    def total_pulls(self) -> int:
+        return sum(arm.count for arm in self._arms.values())
+
+    def select(self) -> str:
+        """Return the backend to use next.
+
+        Every backend is tried once before UCB scoring kicks in. Ties resolve
+        to the earliest backend in the configured order, keeping selection
+        deterministic and reproducible.
+        """
+        # Cold start: try each unpulled backend in order first.
+        for b in self.backends:
+            if self._arms[b].count == 0:
+                return b
+
+        total = self.total_pulls
+
+        def ucb(b: str) -> float:
+            arm = self._arms[b]
+            return arm.mean + self.c * math.sqrt(2 * math.log(total) / arm.count)
+
+        # max() returns the first item on ties → deterministic by order.
+        return max(self.backends, key=ucb)
+
+    def update(self, backend: str, reward: float) -> None:
+        """Record a reward in ``[0, 1]`` for *backend*."""
+        if backend not in self._arms:
+            raise KeyError(f"Unknown backend for bandit update: {backend}")
+        arm = self._arms[backend]
+        arm.count += 1
+        arm.total_reward += reward
+
+    def stats(self) -> dict[str, dict[str, float | int]]:
+        """Return per-backend pull counts and mean rewards (for reporting)."""
+        return {
+            b: {"pulls": arm.count, "mean_reward": round(arm.mean, 4)}
+            for b, arm in self._arms.items()
+        }
diff --git a/src/polyharness/search_log.py b/src/polyharness/search_log.py
index a2bfbc3..a648276 100644
--- a/src/polyharness/search_log.py
+++ b/src/polyharness/search_log.py
@@ -82,5 +82,30 @@ def best_iteration(self) -> int:
             return 0
         return max(self._entries, key=lambda e: e.score).iteration
 
+    def pareto_win_counts(self) -> dict[int, int]:
+        """Map each Pareto-frontier iteration to the number of tasks it wins.
+
+        A candidate is on the frontier if it achieves the top score on at
+        least one individual task (GEPA-style per-task winners). The values
+        are how many tasks each frontier member wins. Returns an empty dict
+        when no per-task scores are recorded.
+        """
+        entries = [e for e in self._entries if e.task_scores]
+        if not entries:
+            return {}
+
+        task_names: set[str] = set()
+        for e in entries:
+            task_names.update(e.task_scores.keys())
+
+        eps = 1e-9
+        counts: dict[int, int] = {}
+        for task in task_names:
+            best = max(e.task_scores.get(task, float("-inf")) for e in entries)
+            for e in entries:
+                if e.task_scores.get(task, float("-inf")) >= best - eps:
+                    counts[e.iteration] = counts.get(e.iteration, 0) + 1
+        return counts
+
     def __len__(self) -> int:
         return len(self._entries)
diff --git a/src/polyharness/workspace.py b/src/polyharness/workspace.py
index 9cfcfb6..9b6dd76 100644
--- a/src/polyharness/workspace.py
+++ b/src/polyharness/workspace.py
@@ -194,6 +194,16 @@ def search_log_path(self) -> Path:
     def candidate_path(self, iteration: int) -> Path:
         return self.candidates_dir / f"iter_{iteration}"
 
+    def candidate_metadata(self, iteration: int) -> dict:
+        """Read a candidate's metadata.json (empty dict if absent/unreadable)."""
+        meta_file = self.candidate_path(iteration) / "metadata.json"
+        if not meta_file.exists():
+            return {}
+        try:
+            return json.loads(meta_file.read_text())
+        except (json.JSONDecodeError, ValueError):
+            return {}
+
     def is_initialized(self) -> bool:
         """Check if workspace has required structure."""
         return (
diff --git a/tests/test_bandit.py b/tests/test_bandit.py
new file mode 100644
index 0000000..7903896
--- /dev/null
+++ b/tests/test_bandit.py
@@ -0,0 +1,66 @@
+"""Tests for the UCB backend-selection bandit."""
+
+import pytest
+
+from polyharness.proposer.bandit import BackendBandit
+
+
+def test_cold_start_tries_each_backend_in_order():
+    b = BackendBandit(["api", "local", "codex"])
+    # With nothing pulled yet, selection walks the backends in order.
+    assert b.select() == "api"
+    b.update("api", 1.0)
+    assert b.select() == "local"
+    b.update("local", 1.0)
+    assert b.select() == "codex"
+
+
+def test_converges_to_better_backend():
+    b = BackendBandit(["good", "bad"], c=0.5)
+    # Cold start: one pull each.
+    b.update(b.select(), 1.0)  # good
+    b.update(b.select(), 0.0)  # bad
+    # Now reward "good" highly and "bad" poorly over many rounds.
+    picks = {"good": 0, "bad": 0}
+    for _ in range(50):
+        choice = b.select()
+        picks[choice] += 1
+        b.update(choice, 1.0 if choice == "good" else 0.0)
+    assert picks["good"] > picks["bad"]
+
+
+def test_deterministic_tie_breaks_by_order():
+    # Two identical arms → ties always resolve to the first backend.
+    b1 = BackendBandit(["x", "y"])
+    b2 = BackendBandit(["x", "y"])
+    for _ in range(10):
+        c1, c2 = b1.select(), b2.select()
+        assert c1 == c2
+        b1.update(c1, 0.5)
+        b2.update(c2, 0.5)
+
+
+def test_dedupe_preserves_order():
+    b = BackendBandit(["api", "api", "local"])
+    assert b.backends == ["api", "local"]
+
+
+def test_empty_backends_raises():
+    with pytest.raises(ValueError):
+        BackendBandit([])
+
+
+def test_update_unknown_backend_raises():
+    b = BackendBandit(["api"])
+    with pytest.raises(KeyError):
+        b.update("nope", 1.0)
+
+
+def test_stats_shape():
+    b = BackendBandit(["api", "local"])
+    b.update("api", 1.0)
+    b.update("api", 0.0)
+    stats = b.stats()
+    assert stats["api"] == {"pulls": 2, "mean_reward": 0.5}
+    assert stats["local"] == {"pulls": 0, "mean_reward": 0.0}
+    assert b.total_pulls == 2
diff --git a/tests/test_cli_features.py b/tests/test_cli_features.py
index a0497c6..d60e603 100644
--- a/tests/test_cli_features.py
+++ b/tests/test_cli_features.py
@@ -201,6 +201,63 @@ def test_log_shows_delta(runner, workspace):
     assert "Δ" in result.output or "delta" in result.output.lower() or "+0.2" in result.output
 
 
+def test_log_marks_pareto_frontier(runner, workspace):
+    """ph log marks per-task winners with the Pareto-frontier glyph."""
+    from polyharness.search_log import SearchLog
+
+    log = SearchLog(workspace.search_log_path)
+    log.append(0, None, 0.5, {"A": 0.5, "B": 0.5})
+    log.append(1, 0, 0.5, {"A": 0.9, "B": 0.1})  # wins A
+    log.append(2, 0, 0.5, {"A": 0.1, "B": 0.9})  # wins B
+
+    result = runner.invoke(main, ["log", "--workspace", str(workspace.root)])
+    assert result.exit_code == 0
+    assert "◆" in result.output
+    assert "Pareto frontier" in result.output
+
+
+def test_log_no_pareto_marker_without_task_scores(runner, workspace):
+    """No frontier glyph when candidates have no per-task scores."""
+    from polyharness.search_log import SearchLog
+
+    log = SearchLog(workspace.search_log_path)
+    log.append(0, None, 0.3, {})
+    log.append(1, 0, 0.5, {})
+
+    result = runner.invoke(main, ["log", "--workspace", str(workspace.root)])
+    assert result.exit_code == 0
+    assert "◆" not in result.output
+
+
+def test_leaderboard_shows_backend_when_recorded(runner, workspace):
+    """ph leaderboard surfaces proposer_backend when an ensemble was used."""
+    from polyharness.search_log import SearchLog
+
+    log = SearchLog(workspace.search_log_path)
+    log.append(0, None, 0.3, {"A": 0.3})
+    log.append(1, 0, 0.6, {"A": 0.6})
+    workspace.store_iteration(0, 0.3, {"A": 0.3}, parent=None, metadata={"source": "base"})
+    workspace.store_iteration(1, 0.6, {"A": 0.6}, parent=0, metadata={"proposer_backend": "codex"})
+
+    result = runner.invoke(main, ["leaderboard", "--workspace", str(workspace.root)])
+    assert result.exit_code == 0
+    assert "Backend" in result.output
+    assert "codex" in result.output
+
+
+def test_leaderboard_hides_backend_without_ensemble(runner, workspace):
+    """No Backend column when no candidate recorded a proposer_backend."""
+    from polyharness.search_log import SearchLog
+
+    log = SearchLog(workspace.search_log_path)
+    log.append(0, None, 0.3, {"A": 0.3})
+    log.append(1, 0, 0.6, {"A": 0.6})
+
+    result = runner.invoke(main, ["leaderboard", "--workspace", str(workspace.root)])
+    assert result.exit_code == 0
+    assert "Backend" not in result.output
+
+
 # --- ph run --resume ---
 
 
diff --git a/tests/test_config.py b/tests/test_config.py
index 750f1f2..b10d48a 100644
--- a/tests/test_config.py
+++ b/tests/test_config.py
@@ -3,6 +3,9 @@
 import tempfile
 from pathlib import Path
 
+import pytest
+from pydantic import ValidationError
+
 from polyharness.config import PolyHarnessConfig
 
 
@@ -11,10 +14,45 @@ def test_default_config():
     assert cfg.search.max_iterations == 20
     assert cfg.search.early_stop_patience == 5
     assert cfg.proposer.backend == "api"
+    assert cfg.proposer.ensemble == []  # single-backend by default
+    assert cfg.search.seed is None
     assert cfg.evaluator.type == "python"
     assert cfg.harness.language == "python"
 
 
+def test_ensemble_accepts_valid_backends():
+    cfg = PolyHarnessConfig.model_validate(
+        {"proposer": {"ensemble": ["local", "api", "codex"]}}
+    )
+    assert cfg.proposer.ensemble == ["local", "api", "codex"]
+
+
+def test_ensemble_rejects_unknown_backend():
+    with pytest.raises(ValidationError):
+        PolyHarnessConfig.model_validate({"proposer": {"ensemble": ["bogus"]}})
+
+
+def test_parent_selection_accepts_pareto():
+    cfg = PolyHarnessConfig.model_validate({"search": {"parent_selection": "pareto"}})
+    assert cfg.search.parent_selection == "pareto"
+
+
+def test_cascade_defaults_and_roundtrip():
+    cfg = PolyHarnessConfig()
+    assert cfg.evaluator.cascade is False
+    assert cfg.evaluator.cascade_threshold == 0.4
+    assert cfg.evaluator.cascade_stage1 == 0
+
+    cfg.evaluator.cascade = True
+    cfg.evaluator.cascade_stage1 = 3
+    with tempfile.TemporaryDirectory() as tmp:
+        path = Path(tmp) / "config.yaml"
+        cfg.to_yaml(path)
+        loaded = PolyHarnessConfig.from_yaml(path)
+    assert loaded.evaluator.cascade is True
+    assert loaded.evaluator.cascade_stage1 == 3
+
+
 def test_config_roundtrip_yaml():
     cfg = PolyHarnessConfig()
     cfg.proposer.backend = "claude-code"  # type: ignore[assignment]
diff --git a/tests/test_orchestrator.py b/tests/test_orchestrator.py
index 3c08e52..daf7a64 100644
--- a/tests/test_orchestrator.py
+++ b/tests/test_orchestrator.py
@@ -198,6 +198,332 @@ def test_orchestrator_resume_already_complete(tmp_path):
     assert result.best_score > 0
 
 
+def test_orchestrator_pareto_selection(tmp_path):
+    """Pareto selection should run end-to-end and find improvements."""
+    ws = _setup_workspace(tmp_path)
+    config = ws.load_config()
+    config.search.max_iterations = 5
+    config.search.early_stop_patience = 10
+    config.search.parent_selection = "pareto"
+
+    orch = Orchestrator(
+        workspace=ws,
+        config=config,
+        proposer=MockProposer(),
+        evaluator=MockEvaluator(),
+    )
+    result = orch.run()
+
+    assert isinstance(result, SearchResult)
+    assert result.total_iterations >= 5
+    assert result.best_score > 0.3
+
+
+def test_pareto_select_picks_per_task_winner(tmp_path):
+    """A per-task specialist should be selectable even when not best overall.
+
+    iter_0 is mediocre on every task; iter_1 and iter_2 each win exactly one
+    task. 'best' selection would never branch from a specialist, but the
+    Pareto frontier keeps them alive.
+    """
+    import random
+
+    ws = _setup_workspace(tmp_path)
+    config = ws.load_config()
+    orch = Orchestrator(
+        workspace=ws,
+        config=config,
+        proposer=MockProposer(),
+        evaluator=MockEvaluator(),
+    )
+
+    # Same overall score (0.5) but different per-task profiles.
+    orch.search_log.append(0, None, 0.5, {"A": 0.5, "B": 0.5})
+    orch.search_log.append(1, 0, 0.5, {"A": 0.9, "B": 0.1})  # wins task A
+    orch.search_log.append(2, 0, 0.5, {"A": 0.1, "B": 0.9})  # wins task B
+
+    random.seed(0)
+    picks = {orch._pareto_select() for _ in range(100)}
+
+    # iter_0 wins no task → must never be chosen; both specialists reachable.
+    assert 0 not in picks
+    assert picks == {1, 2}
+
+
+def test_pareto_select_falls_back_without_task_scores(tmp_path):
+    """Without per-task scores, Pareto selection degrades to best-overall."""
+    ws = _setup_workspace(tmp_path)
+    config = ws.load_config()
+    orch = Orchestrator(
+        workspace=ws,
+        config=config,
+        proposer=MockProposer(),
+        evaluator=MockEvaluator(),
+    )
+    orch.search_log.append(0, None, 0.4, {})
+    orch.search_log.append(1, 0, 0.7, {})
+
+    assert orch._pareto_select() == 1  # == best_iteration
+
+
+def test_novelty_filter_skips_duplicate(tmp_path):
+    """A proposer that always emits identical code should get its candidates
+    rejected (and their evaluation skipped) when the novelty filter is on."""
+
+    class ConstantProposer(BaseProposer):
+        def propose(self, workspace_root, candidate_dir, iteration, parent):
+            (candidate_dir / "harness.py").write_text("SCORE_HINT = 0.3\n")
+            return {"changes_summary": "no change"}
+
+    ws = _setup_workspace(tmp_path)
+    config = ws.load_config()
+    config.search.max_iterations = 5
+    config.search.early_stop_patience = 3
+    config.search.novelty_filter = True
+    config.search.novelty_threshold = 0.97
+    config.search.novelty_max_retries = 1
+
+    orch = Orchestrator(
+        workspace=ws,
+        config=config,
+        proposer=ConstantProposer(),
+        evaluator=MockEvaluator(),
+    )
+    orch.run()
+
+    # Only the base (iter_0) is evaluated; every later duplicate is skipped,
+    # so it never gets appended to the search log.
+    logged = [e.iteration for e in orch.search_log.entries]
+    assert logged == [0]
+    # And the skipped candidate dir is cleaned up (no dangling copy).
+    assert not ws.candidate_path(1).exists()
+
+
+def test_novelty_filter_allows_novel(tmp_path):
+    """Distinct candidates should pass the novelty gate and be evaluated."""
+    ws = _setup_workspace(tmp_path)
+    config = ws.load_config()
+    config.search.max_iterations = 3
+    config.search.early_stop_patience = 10
+    config.search.novelty_filter = True
+    config.search.novelty_threshold = 0.97
+
+    orch = Orchestrator(
+        workspace=ws,
+        config=config,
+        proposer=MockProposer(),  # writes a distinct harness each iteration
+        evaluator=MockEvaluator(),
+    )
+    result = orch.run()
+
+    assert result.best_score > 0.3
+    for i in (1, 2, 3):
+        assert (ws.candidate_path(i) / "score.json").exists()
+
+
+def test_max_similarity_detects_identical(tmp_path):
+    """_max_similarity returns ~1.0 for identical code, low for distinct code."""
+    ws = _setup_workspace(tmp_path)
+    config = ws.load_config()
+    orch = Orchestrator(
+        workspace=ws,
+        config=config,
+        proposer=MockProposer(),
+        evaluator=MockEvaluator(),
+    )
+
+    # iter_0 = copy of base ("SCORE_HINT = 0.3")
+    ws.prepare_candidate(0, parent=None)
+    orch.search_log.append(0, None, 0.3, {"mock_task": 0.3})
+
+    # An identical candidate scores ~1.0 similarity against iter_0.
+    dup = ws.prepare_candidate(1, parent=0)
+    (dup / "harness.py").write_text("SCORE_HINT = 0.3\n")
+    assert orch._max_similarity(1, dup) > 0.97
+
+    # A clearly different candidate scores low.
+    novel = ws.prepare_candidate(2, parent=0)
+    (novel / "harness.py").write_text(
+        "import math\n\ndef solve(x):\n    return math.sqrt(x) * 42 + len(str(x))\n"
+    )
+    assert orch._max_similarity(2, novel) < 0.97
+
+
+def test_orchestrator_ensemble_bandit(tmp_path):
+    """The bandit should favor the backend that produces improvements."""
+
+    class HighProposer(BaseProposer):
+        """Improves every iteration."""
+
+        def propose(self, workspace_root, candidate_dir, iteration, parent):
+            score = min(0.4 + iteration * 0.1, 1.0)
+            (candidate_dir / "harness.py").write_text(f"SCORE_HINT = {score}\n")
+            return {"changes_summary": "high"}
+
+    class LowProposer(BaseProposer):
+        """Never improves over the base."""
+
+        def propose(self, workspace_root, candidate_dir, iteration, parent):
+            (candidate_dir / "harness.py").write_text("SCORE_HINT = 0.3\n")
+            return {"changes_summary": "low"}
+
+    ws = _setup_workspace(tmp_path)
+    config = ws.load_config()
+    config.search.max_iterations = 6
+    config.search.early_stop_patience = 20
+    config.proposer.ensemble = ["local", "api"]  # valid backend names
+
+    orch = Orchestrator(
+        workspace=ws,
+        config=config,
+        # Inject mocks keyed by backend name so no real CLIs/API are touched.
+        proposers={"local": HighProposer(), "api": LowProposer()},
+        evaluator=MockEvaluator(),
+    )
+    result = orch.run()
+
+    assert orch.bandit is not None
+    stats = orch.bandit.stats()
+    # Both arms are tried at least once (cold start), and the improving backend
+    # ("local"=HighProposer) earns a perfect improve-rate while the other earns 0.
+    assert stats["local"]["pulls"] >= 1
+    assert stats["api"]["pulls"] >= 1
+    assert stats["local"]["mean_reward"] == 1.0
+    assert stats["api"]["mean_reward"] == 0.0
+    assert stats["local"]["pulls"] >= stats["api"]["pulls"]
+    assert result.best_score >= 0.9
+    # The winning candidate records which backend produced it.
+    best_meta = json.loads(
+        (ws.candidate_path(result.best_iteration) / "metadata.json").read_text()
+    )
+    assert best_meta["proposer_backend"] == "local"
+
+
+def test_ensemble_disabled_when_proposer_injected(tmp_path):
+    """An explicit single proposer wins over an ensemble config (back-compat)."""
+    ws = _setup_workspace(tmp_path)
+    config = ws.load_config()
+    config.search.max_iterations = 2
+    config.search.early_stop_patience = 10
+    config.proposer.ensemble = ["local", "api"]
+
+    orch = Orchestrator(
+        workspace=ws,
+        config=config,
+        proposer=MockProposer(),  # explicit → disables the bandit
+        evaluator=MockEvaluator(),
+    )
+    assert orch.bandit is None
+    result = orch.run()
+    assert result.best_score > 0.3
+
+
+def test_seed_makes_search_reproducible(tmp_path):
+    """Same seed + randomized strategy → identical parent-selection trajectory."""
+
+    def run_once(path):
+        ws = Workspace.init(path)
+        (ws.base_harness_dir / "harness.py").write_text("SCORE_HINT = 0.3\n")
+        config = ws.load_config()
+        config.search.max_iterations = 6
+        config.search.early_stop_patience = 20
+        config.search.parent_selection = "tournament"
+        config.search.seed = 42
+        orch = Orchestrator(
+            workspace=ws,
+            config=config,
+            proposer=MockProposer(),
+            evaluator=MockEvaluator(),
+        )
+        orch.run()
+        return [e.parent for e in orch.search_log.entries]
+
+    assert run_once(tmp_path / "a") == run_once(tmp_path / "b")
+
+
+class PerTaskEvaluator(BaseEvaluator):
+    """Evaluator that scores by task name and records which task lists it ran."""
+
+    def __init__(self, task_scores):
+        self.task_scores = task_scores
+        self.calls: list[list[str]] = []
+
+    def evaluate(self, candidate_dir, tasks):
+        from pathlib import Path
+
+        stems = [Path(t).stem for t in tasks]
+        self.calls.append(stems)
+        ts = {s: self.task_scores.get(s, 0.0) for s in stems}
+        overall = sum(ts.values()) / len(ts) if ts else 0.0
+        return EvalResult(overall_score=overall, task_scores=ts)
+
+
+def _cascade_config(ws, **overrides):
+    config = ws.load_config()
+    config.search.max_iterations = 2
+    config.search.early_stop_patience = 10
+    config.evaluator.tasks = ["t1.json", "t2.json", "t3.json", "t4.json"]
+    config.evaluator.cascade = True
+    config.evaluator.cascade_stage1 = 2
+    config.evaluator.cascade_threshold = 0.5
+    for k, v in overrides.items():
+        setattr(config.evaluator, k, v)
+    return config
+
+
+def test_cascade_gates_weak_candidate(tmp_path):
+    """A candidate failing stage 1 should never trigger stage-2 evaluation."""
+    ws = _setup_workspace(tmp_path)
+    config = _cascade_config(ws)
+    ev = PerTaskEvaluator({"t1": 0.3, "t2": 0.3, "t3": 0.9, "t4": 0.9})
+
+    Orchestrator(ws, config, proposer=MockProposer(), evaluator=ev).run()
+
+    # Base harness is scored in full (cascade never applies to it).
+    assert ev.calls[0] == ["t1", "t2", "t3", "t4"]
+    # Candidates run only stage 1 and are gated → stage 2 never runs alone.
+    assert ["t1", "t2"] in ev.calls
+    assert ["t3", "t4"] not in ev.calls
+
+
+def test_cascade_runs_full_for_strong_candidate(tmp_path):
+    """A candidate clearing stage 1 should proceed to the full task set."""
+    ws = _setup_workspace(tmp_path)
+    config = _cascade_config(ws)
+    ev = PerTaskEvaluator({"t1": 0.8, "t2": 0.8, "t3": 0.8, "t4": 0.8})
+
+    orch = Orchestrator(ws, config, proposer=MockProposer(), evaluator=ev)
+    orch.run()
+
+    assert ["t1", "t2"] in ev.calls  # stage 1
+    assert ["t3", "t4"] in ev.calls  # stage 2 ran too
+    cand = next(e for e in orch.search_log.entries if e.iteration == 1)
+    assert set(cand.task_scores) == {"t1", "t2", "t3", "t4"}
+
+
+def test_cascade_disabled_runs_full(tmp_path):
+    """With cascade off, every evaluation uses the full task list."""
+    ws = _setup_workspace(tmp_path)
+    config = _cascade_config(ws, cascade=False)
+    ev = PerTaskEvaluator({"t1": 0.3, "t2": 0.3, "t3": 0.9, "t4": 0.9})
+
+    Orchestrator(ws, config, proposer=MockProposer(), evaluator=ev).run()
+
+    assert ["t1", "t2"] not in ev.calls
+    assert all(call == ["t1", "t2", "t3", "t4"] for call in ev.calls)
+
+
+def test_cascade_base_always_full(tmp_path):
+    """The base harness is fully evaluated even when candidates are gated."""
+    ws = _setup_workspace(tmp_path)
+    config = _cascade_config(ws, cascade_threshold=0.99)  # gate every candidate
+    ev = PerTaskEvaluator({"t1": 0.5, "t2": 0.5, "t3": 0.5, "t4": 0.5})
+
+    Orchestrator(ws, config, proposer=MockProposer(), evaluator=ev).run()
+
+    assert ev.calls[0] == ["t1", "t2", "t3", "t4"]
+
+
 def test_orchestrator_error_recovery(tmp_path):
     """Orchestrator should skip failing iterations and continue."""
 
diff --git a/tests/test_search_log.py b/tests/test_search_log.py
index 454984a..872d110 100644
--- a/tests/test_search_log.py
+++ b/tests/test_search_log.py
@@ -59,3 +59,23 @@ def test_log_entry_roundtrip():
     assert restored.parent == 1
     assert restored.score == 0.72
     assert restored.task_scores == {"a": 0.8}
+
+
+def test_pareto_win_counts(tmp_path):
+    log = SearchLog(tmp_path / "search_log.jsonl")
+    log.append(0, None, 0.5, {"A": 0.5, "B": 0.5})
+    log.append(1, 0, 0.5, {"A": 0.9, "B": 0.1})  # wins task A
+    log.append(2, 0, 0.5, {"A": 0.1, "B": 0.9})  # wins task B
+
+    counts = log.pareto_win_counts()
+    # iter_0 wins nothing; iter_1 and iter_2 each win one task.
+    assert set(counts) == {1, 2}
+    assert counts[1] == 1
+    assert counts[2] == 1
+
+
+def test_pareto_win_counts_empty_without_task_scores(tmp_path):
+    log = SearchLog(tmp_path / "search_log.jsonl")
+    log.append(0, None, 0.3, {})
+    log.append(1, 0, 0.5, {})
+    assert log.pareto_win_counts() == {}
diff --git a/tests/test_workspace.py b/tests/test_workspace.py
index 75f8622..ee8878e 100644
--- a/tests/test_workspace.py
+++ b/tests/test_workspace.py
@@ -241,3 +241,11 @@ def test_apply_best(tmp_path):
     assert (target / "harness.py").read_text() == "# optimized\n"
     assert not (target / "score.json").exists()
     assert not (target / "traces").exists()
+
+
+def test_candidate_metadata(tmp_path):
+    ws = Workspace.init(tmp_path / "ws")
+    ws.store_iteration(0, 0.5, {"A": 0.5}, parent=None, metadata={"proposer_backend": "codex"})
+    assert ws.candidate_metadata(0)["proposer_backend"] == "codex"
+    # Missing candidate → empty dict (no crash).
+    assert ws.candidate_metadata(99) == {}