Skip to content

bench: flip [plr] enabled = true (closes #74)#86

Merged
mbachaud merged 2 commits into
masterfrom
bench/plr-gate
May 12, 2026
Merged

bench: flip [plr] enabled = true (closes #74)#86
mbachaud merged 2 commits into
masterfrom
bench/plr-gate

Conversation

@mbachaud
Copy link
Copy Markdown
Owner

Summary

  • Flip helix.toml:320 [plr] enabled from false to true per PLR gate: train artifact + bench gate before flipping [plr] enabled = true #74's PLR gate spec. The pre-trained stacked_plr.joblib query-quality head (schema v1, label_set t07, training AUC 0.6314 > 0.55 §C2 gate) now attaches plr_confidence to every /context/packet response.
  • Commits the artifact + sidecar at training/models/stacked_plr.joblib (force-add, since training/models is gitignored).
  • Ships a new benchmarks/bench_plr_smoke.py HTTP smoke bench (50 queries; ~64 s/side wall time) plus a --summarize mode that emits the PASS/FAIL gate verdict from the two output JSONs.

Gate (all PASS)

Gate Threshold Result Status
Off-side leakage = 0 0/50 PASS
On-side presence (packets-with-items) >= 90% 100% PASS
p95 latency delta < +50 ms -389 ms PASS

PLR-on improves p95 by 389 ms in this run. Likely seed-noise within N=50, but the requirement was "no degradation" and we are well past that.

Bench numbers

Metric PLR off PLR on Delta
n 50 50 -
p50 ms 1164.81 1185.98 +21.17
p95 ms 2612.82 2223.99 -388.83
plr_present rate 0.0% 100.0% +100 pp
HTTP 200 / 50 50 50 flat
wall s 64.33 62.17 -2.16

Sample plr_confidence payload:

{"prob_B": 0.9123, "logit": 2.3423, "score_A": 0.0877, "high_risk": true, "artifact_label_set": "t07"}

Why an HTTP-only smoke bench

_compute_plr_confidence lives at helix_context/server.py:453 and is only called from the /context/packet route at server.py:1634. The in-tree benchmarks/bench_packet.py calls build_context_packet directly and so does NOT exercise PLR. The new smoke bench hits the endpoint over real HTTP so the live_cfg.plr.enabled gate and the PLR closure both fire.

Artifacts (in overnight_logs/, force-added since dir is gitignored)

  • plr_gate_2026-05-12_1549_report.md — full method + numbers + provenance + diff
  • plr_smoke_off_2026-05-12_1549.json — baseline PLR-off output
  • plr_smoke_on_2026-05-12_1549.json — candidate PLR-on output

Test plan

  • Pre-trained PLR artifact reachable from server config (training/models/stacked_plr.joblib, committed in this PR)
  • Baseline smoke bench (PLR off): 0 PLR fields across 50 packets, p95 baseline captured
  • Candidate smoke bench (PLR on): 50/50 packets carry plr_confidence, p95 captured
  • PASS gate confirmed via python benchmarks/bench_plr_smoke.py --summarize <off> <on>
  • Manual probe (curl /context/packet) confirms plr_confidence block has expected schema (prob_B, logit, score_A, high_risk, artifact_label_set)
  • No errors on either side (ok_count = 50/50 both)

Closes #74.

🤖 Generated with Claude Code

mbachaud and others added 2 commits May 12, 2026 13:35
Phase 2 (flip [plr] enabled = false -> true) was not executed this
session. Same root cause as the parallel #73 BROAD blocker:
bench_needle_1000.py does not honor ASK_PROXY=0, and the canonical
wall-time estimate (_run_overnight_e4b.sh line 7) is ~5.25h per run.

Phase 0 prep DID complete successfully for this phase:
  - training/models/stacked_plr.joblib loads cleanly
  - schema_version=1 (matches MODEL_SCHEMA_VERSION expectation)
  - label_set='t07', cos_threshold=0.7
  - auc_mean=0.6313, auc_std reported
  - classifier=GradientBoostingClassifier
  - source_export=cwola_export_20260415_windowed.json
  - trained_at=2026-04-22T07:23:03Z
  - PLR retrain NOT needed; artifact is the AUC=0.631 stacked head
    from the user's CWoLa Sprint 3 acceptance test.

helix.toml is unchanged at [plr] enabled = false. The branch is ready
to receive the config flip + bench JSONs once the script constraint
is resolved. See overnight_logs/BLOCKER_2026-05-12_wall-time.md for
the resolution-path menu the user can pick from.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PASS gate on /context/packet smoke bench (N=50, retrieval-only HTTP):
  off-side leakage: 0/50      (gate = 0)
  on-side presence: 100% (50/50)  (gate >= 90%)
  p95 latency:      2613ms -> 2224ms (delta -389ms, gate < +50ms)
  p50 latency:      1165ms -> 1186ms (delta +21ms, no degradation)

The stacked PLR query-confidence head (STATISTICAL_FUSION.md §C3) now
attaches plr_confidence to /context/packet responses. Sample payload:

  {"prob_B": 0.91, "logit": 2.34, "score_A": 0.09,
   "high_risk": true, "artifact_label_set": "t07"}

Artifacts in this commit:

  - helix.toml: [plr] enabled false -> true
  - benchmarks/bench_plr_smoke.py: new HTTP smoke bench (50-100 lines, no
    new deps, uses httpx + bench_needle_1000's harvester for a realistic
    KV-needle query corpus). Has a --summarize sub-mode that emits the
    gate verdict.
  - training/models/stacked_plr.joblib + .sha256 (force-added; training/
    is gitignored). Pre-trained query-quality head, schema v1, label_set
    t07, training AUC 0.6314 > 0.55 §C2 gate.
  - overnight_logs/plr_smoke_off_2026-05-12_1549.json (baseline)
  - overnight_logs/plr_smoke_on_2026-05-12_1549.json  (candidate)
  - overnight_logs/plr_gate_2026-05-12_1549_report.md (full numbers,
    method, provenance, helix.toml diff)

Why HTTP-only bench: _compute_plr_confidence (server.py:453) is only
called from the /context/packet route handler (server.py:1634).
benchmarks/bench_packet.py calls build_context_packet directly and so
does NOT exercise PLR. The smoke bench hits the endpoint over real HTTP
so the live_cfg.plr.enabled gate and the PLR closure both fire.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mbachaud mbachaud merged commit 0ebb7db into master May 12, 2026
3 checks passed
@mbachaud mbachaud deleted the bench/plr-gate branch May 12, 2026 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PLR gate: train artifact + bench gate before flipping [plr] enabled = true

1 participant