bench: flip [plr] enabled = true (closes #74) by mbachaud · Pull Request #86 · mbachaud/helix-context

mbachaud · 2026-05-12T22:54:55Z

Summary

Flip helix.toml:320 [plr] enabled from false to true per PLR gate: train artifact + bench gate before flipping [plr] enabled = true #74's PLR gate spec. The pre-trained stacked_plr.joblib query-quality head (schema v1, label_set t07, training AUC 0.6314 > 0.55 §C2 gate) now attaches plr_confidence to every /context/packet response.
Commits the artifact + sidecar at training/models/stacked_plr.joblib (force-add, since training/models is gitignored).
Ships a new benchmarks/bench_plr_smoke.py HTTP smoke bench (50 queries; ~64 s/side wall time) plus a --summarize mode that emits the PASS/FAIL gate verdict from the two output JSONs.

Gate (all PASS)

Gate	Threshold	Result	Status
Off-side leakage	`= 0`	`0/50`	PASS
On-side presence (packets-with-items)	`>= 90%`	`100%`	PASS
p95 latency delta	`< +50 ms`	`-389 ms`	PASS

PLR-on improves p95 by 389 ms in this run. Likely seed-noise within N=50, but the requirement was "no degradation" and we are well past that.

Bench numbers

Metric	PLR off	PLR on	Delta
n	50	50	-
p50 ms	1164.81	1185.98	+21.17
p95 ms	2612.82	2223.99	-388.83
plr_present rate	0.0%	100.0%	+100 pp
HTTP 200 / 50	50	50	flat
wall s	64.33	62.17	-2.16

Sample plr_confidence payload:

{"prob_B": 0.9123, "logit": 2.3423, "score_A": 0.0877, "high_risk": true, "artifact_label_set": "t07"}

Why an HTTP-only smoke bench

_compute_plr_confidence lives at helix_context/server.py:453 and is only called from the /context/packet route at server.py:1634. The in-tree benchmarks/bench_packet.py calls build_context_packet directly and so does NOT exercise PLR. The new smoke bench hits the endpoint over real HTTP so the live_cfg.plr.enabled gate and the PLR closure both fire.

Artifacts (in `overnight_logs/`, force-added since dir is gitignored)

plr_gate_2026-05-12_1549_report.md — full method + numbers + provenance + diff
plr_smoke_off_2026-05-12_1549.json — baseline PLR-off output
plr_smoke_on_2026-05-12_1549.json — candidate PLR-on output

Test plan

Pre-trained PLR artifact reachable from server config (training/models/stacked_plr.joblib, committed in this PR)
Baseline smoke bench (PLR off): 0 PLR fields across 50 packets, p95 baseline captured
Candidate smoke bench (PLR on): 50/50 packets carry plr_confidence, p95 captured
PASS gate confirmed via python benchmarks/bench_plr_smoke.py --summarize <off> <on>
Manual probe (curl /context/packet) confirms plr_confidence block has expected schema (prob_B, logit, score_A, high_risk, artifact_label_set)
No errors on either side (ok_count = 50/50 both)

Closes #74.

🤖 Generated with Claude Code

Phase 2 (flip [plr] enabled = false -> true) was not executed this session. Same root cause as the parallel #73 BROAD blocker: bench_needle_1000.py does not honor ASK_PROXY=0, and the canonical wall-time estimate (_run_overnight_e4b.sh line 7) is ~5.25h per run. Phase 0 prep DID complete successfully for this phase: - training/models/stacked_plr.joblib loads cleanly - schema_version=1 (matches MODEL_SCHEMA_VERSION expectation) - label_set='t07', cos_threshold=0.7 - auc_mean=0.6313, auc_std reported - classifier=GradientBoostingClassifier - source_export=cwola_export_20260415_windowed.json - trained_at=2026-04-22T07:23:03Z - PLR retrain NOT needed; artifact is the AUC=0.631 stacked head from the user's CWoLa Sprint 3 acceptance test. helix.toml is unchanged at [plr] enabled = false. The branch is ready to receive the config flip + bench JSONs once the script constraint is resolved. See overnight_logs/BLOCKER_2026-05-12_wall-time.md for the resolution-path menu the user can pick from. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PASS gate on /context/packet smoke bench (N=50, retrieval-only HTTP): off-side leakage: 0/50 (gate = 0) on-side presence: 100% (50/50) (gate >= 90%) p95 latency: 2613ms -> 2224ms (delta -389ms, gate < +50ms) p50 latency: 1165ms -> 1186ms (delta +21ms, no degradation) The stacked PLR query-confidence head (STATISTICAL_FUSION.md §C3) now attaches plr_confidence to /context/packet responses. Sample payload: {"prob_B": 0.91, "logit": 2.34, "score_A": 0.09, "high_risk": true, "artifact_label_set": "t07"} Artifacts in this commit: - helix.toml: [plr] enabled false -> true - benchmarks/bench_plr_smoke.py: new HTTP smoke bench (50-100 lines, no new deps, uses httpx + bench_needle_1000's harvester for a realistic KV-needle query corpus). Has a --summarize sub-mode that emits the gate verdict. - training/models/stacked_plr.joblib + .sha256 (force-added; training/ is gitignored). Pre-trained query-quality head, schema v1, label_set t07, training AUC 0.6314 > 0.55 §C2 gate. - overnight_logs/plr_smoke_off_2026-05-12_1549.json (baseline) - overnight_logs/plr_smoke_on_2026-05-12_1549.json (candidate) - overnight_logs/plr_gate_2026-05-12_1549_report.md (full numbers, method, provenance, helix.toml diff) Why HTTP-only bench: _compute_plr_confidence (server.py:453) is only called from the /context/packet route handler (server.py:1634). benchmarks/bench_packet.py calls build_context_packet directly and so does NOT exercise PLR. The smoke bench hits the endpoint over real HTTP so the live_cfg.plr.enabled gate and the PLR closure both fire. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mbachaud and others added 2 commits May 12, 2026 13:35

mbachaud merged commit 0ebb7db into master May 12, 2026
3 checks passed

mbachaud deleted the bench/plr-gate branch May 12, 2026 22:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench: flip [plr] enabled = true (closes #74)#86

bench: flip [plr] enabled = true (closes #74)#86
mbachaud merged 2 commits into
masterfrom
bench/plr-gate

mbachaud commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mbachaud commented May 12, 2026

Summary

Gate (all PASS)

Bench numbers

Why an HTTP-only smoke bench

Artifacts (in overnight_logs/, force-added since dir is gitignored)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Artifacts (in `overnight_logs/`, force-added since dir is gitignored)