Skip to content

feat: borrow from the now-open Stanford Meta-Harness — proposer principles + held-out val/test (v0.2.4)#9

Merged
weijt606 merged 4 commits into
mainfrom
v0.2.4
May 26, 2026
Merged

feat: borrow from the now-open Stanford Meta-Harness — proposer principles + held-out val/test (v0.2.4)#9
weijt606 merged 4 commits into
mainfrom
v0.2.4

Conversation

@weijt606
Copy link
Copy Markdown
Owner

Summary

Stanford's Meta-Harness framework is now open-sourced (MIT). This release borrows
two battle-tested ideas from it — re-implemented in our own code, no third-party
source bundled — and tightens the project's open-source declarations.

What's new

Proposer improvement principles

Every Proposer prompt/instruction (API, CLI, and the injected CLAUDE.md/AGENTS.md)
now carries shared directives re-authored from the official Meta-Harness reference
Skill: change a real mechanism, not just constants; don't overfit / hardcode the
eval set
; ground changes in trace evidence; state a falsifiable hypothesis.
Raises candidate quality at the source — complements the post-hoc novelty filter.

Held-out val/test split (evaluator.eval_split, val_tasks, test_tasks)

Evolve the harness on val_tasks (selection, Pareto, and early-stop all use the
validation set), then score only the best candidate, once, on held-out test_tasks
at the end. The test score never drives selection — an honest, post-hoc number that
exposes harness overfitting to the eval set (the Meta-Harness val/test methodology).
Per-task mode only; off by default. Shown in the run summary and ph best,
persisted to summary/holdout_test.json.

Open-source / attribution

  • Backstory corrected: the framework is no longer "never open-sourced" — reframed
    PolyHarness's positioning and linked the official repo.
  • Acknowledgments section added (README / README_CN) crediting the open works
    PolyHarness borrows ideas from (Stanford Meta-Harness, GEPA, ShinkaEvolve,
    OpenEvolve, Darwin Gödel Machine), and stating explicitly that no third-party
    code is bundled
    — ideas are re-implemented and attributed inline.
  • CONTRIBUTING documents the attribution policy (borrow ideas, don't vendor code).
  • License line expanded (MIT, © 2026 weijt606).
  • Fixed a leftover stale ph shell-hook install help string (codex exec / opencode run).

Compatibility & safety

  • All new behavior is opt-in / default-off; no breaking changes to existing runs.
  • No new dependencies. The held-out eval reuses the existing subprocess evaluator — no
    new execution surface.

Testing

  • ruff check src/ tests/ — clean
  • pytest tests/210 passed (+6: val/test-split + config + proposer-principles tests)
  • End-to-end smoke (local backend): initrunbest verified on v0.2.4.

weijt606 added 4 commits May 26, 2026 14:21
…0.2.4)

Borrow (re-author, MIT-compatible) the high-value directives from the official
Stanford Meta-Harness reference Skill and inject them into every Proposer
prompt/instruction (API, CLI, and the workspace CLAUDE.md/AGENTS.md): change a
real mechanism not just constants, don't overfit the eval set, ground changes in
trace evidence, state a falsifiable hypothesis. Complements the post-hoc novelty
filter by raising candidate quality at the source.

Also: the Stanford Meta-Harness framework is now open-sourced (MIT) — corrected
the now-stale 'never open-sourced' Backstory in README/README_CN and linked the
official repo. 207 tests, lint clean.
Borrow the Stanford Meta-Harness val/test methodology: evolve the harness on
val_tasks (selection, Pareto, early-stop all use val), then score ONLY the best
candidate once on held-out test_tasks at the end. The test score never drives
selection — an honest post-hoc number. Per-task mode only, off by default
(eval_split). Shown in the run summary + ph best, persisted to
summary/holdout_test.json. Folded into the 0.2.4 release alongside the proposer
improvement principles. 210 tests, lint clean.
…ons)

State explicitly that PolyHarness bundles NO third-party code — its techniques
are independently re-implemented from public papers/docs/MIT repos (Stanford
Meta-Harness, GEPA, ShinkaEvolve, OpenEvolve, Darwin Gödel Machine) and
attributed inline. Add an Acknowledgments section (README/README_CN), expand the
License line (MIT, (c) 2026 weijt606), and document the attribution policy in
CONTRIBUTING. Part of the 0.2.4 release.
Leftover from the v0.2.3 adapter refresh: the 'ph shell-hook install' docstring
still showed 'codex ...' / 'opencode -p ...'. Align with current invocations.
Part of 0.2.4.
@weijt606 weijt606 merged commit bacc613 into main May 26, 2026
3 checks passed
@weijt606 weijt606 deleted the v0.2.4 branch May 26, 2026 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant