Skip to content

refactor(judge-batch): delegate build_batch_request to lib chokepoint (L334 part 2/2)#249

Merged
cipher813 merged 1 commit into
mainfrom
refactor/judge-batch-uses-lib-chokepoint-l334
May 28, 2026
Merged

refactor(judge-batch): delegate build_batch_request to lib chokepoint (L334 part 2/2)#249
cipher813 merged 1 commit into
mainfrom
refactor/judge-batch-uses-lib-chokepoint-l334

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

evals/judge.py::build_batch_request now delegates payload construction to alpha_engine_lib.anthropic_payload.build_batches_request_params (L334 second-consumer chokepoint). Drops the inline {custom_id, params} dict construction; same wire shape, same behavior.

Lib pin v0.34.0 → v0.41.0 in lockstep across:

  • requirements.txt
  • Dockerfile (main research Lambda image)
  • Dockerfile.alerts (research-alerts Lambda image)

Why

The chokepoint enforces the server-tool ⊥ assistant-prefill invariant on the embedded params dict — so a future RubricEval extension that adds a server-side tool (e.g. web_search for citation lookup) can't silently reach Anthropic's HTTP 400 the way morning-signal did in May.

ROADMAP L334 part 2/2 — consumer migration. Part 1 = alpha-engine-lib #85 (build_batches_request_params + v0.40.1 → v0.41.0).

⚠️ Merge-blocked

This PR is blocked on:

  1. alpha-engine-lib fix(lambda): bundle scripts/ into image for cost-aggregator import #85 merging
  2. The v0.41.0 git tag being cut on alpha-engine-lib

Once both land, this PR's CI will go green (currently fails on the unreachable lib pin).

Test plan

  • Full suite green locally with lib installed editable from local v0.41.0: 1609 passed
  • tests/test_eval_judge_batch.py (12 tests covering build_batch_request shape, error semantics, schema pin) all pass against the lib-routed implementation
  • CI green once lib v0.41.0 tag exists

🤖 Generated with Claude Code

… (L334)

`evals/judge.py::build_batch_request` now delegates payload construction
to `alpha_engine_lib.anthropic_payload.build_batches_request_params`
(L334 second-consumer chokepoint, shipped in lib v0.41.0). Drops the
inline `{custom_id, params}` dict construction; same wire shape, same
behavior.

The chokepoint enforces the server-tool ⊥ assistant-prefill invariant
on the embedded `params` dict — so a future RubricEval extension that
adds a server-side tool (e.g. `web_search` for citation lookup) can't
silently reach Anthropic's HTTP 400 the way morning-signal did in May.

Lib pin v0.34.0 → v0.41.0 in lockstep across:
- requirements.txt
- Dockerfile (main research Lambda image)
- Dockerfile.alerts (research-alerts Lambda image)

ROADMAP: **L334** part 2/2 — consumer migration. Part 1 = alpha-engine-lib
PR #85 (build_batches_request_params + v0.40.1 → v0.41.0).

**Merge-blocked on lib PR #85 landing + the v0.41.0 git tag.**

Suite: 1609 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit ecfc78a into main May 28, 2026
2 checks passed
@cipher813 cipher813 deleted the refactor/judge-batch-uses-lib-chokepoint-l334 branch May 28, 2026 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant