Skip to content

feat(contracts): inject imageQaSample hook into oc_assert runtime#1452

Merged
shaun0927 merged 1 commit into
developfrom
feat/1432-runtime-hook
May 28, 2026
Merged

feat(contracts): inject imageQaSample hook into oc_assert runtime#1452
shaun0927 merged 1 commit into
developfrom
feat/1432-runtime-hook

Conversation

@shaun0927
Copy link
Copy Markdown
Owner

Summary

Stacked on #1445. Completes the runtime wire-up promised in the original #1432-B brief that was deferred from that PR.

  • image_qa.ts exports a reusable runImageQaSampling(ctx, params) that does the sampling forwarding.
  • oc_assert.buildEvalContext now accepts the calling ToolContext and injects an imageQaSample closure delegating to runImageQaSampling.
  • oc_assert handler signature accepts the optional context parameter.
  • image_qa evaluator stamps details.error on infra-fault paths so isInconclusive translates them to verdict='inconclusive' (not fail). Runtime wiring problems are not contract failures.

Why this is a fix not a feature

Without this, the image_qa DSL clause was dead code in production: every contract returned inconclusive with reason host_runtime_did_not_wire_imageQaSample. PR #1445 implicitly required this wire-up to keep its commitment.

SSOT (#1359) alignment

  • No server-side LLM. The closure only forwards via sampling/createMessage, falls back to unsupported_by_host.

Test plan

  • tests/contracts/oc-assert-image-qa-runtime.test.ts — 4/4 pass (pass on match, fail on mismatch, inconclusive without sampling cap, inconclusive without screenshot).
  • tests/contracts/image-qa.test.ts — 9/9 still pass.
  • tests/tools/image-qa.test.ts — 6/6 still pass.

Stacked on #1445#1441.

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@shaun0927 shaun0927 force-pushed the feat/1432-assert-image-qa-clause branch from fb7546d to 7a20a21 Compare May 28, 2026 14:16
shaun0927 added a commit that referenced this pull request May 28, 2026
…1445)

Adds a host-mediated image_qa leaf assertion to the outcome-contract DSL: validator (ReDoS-guarded at parse time), evaluator (optional ctx.imageQaSample hook), evaluate dispatch, and oc_assert expectedActualFor handling. No server-side LLM; deterministic passed:false fallback when the hook/screenshot is absent (SSOT #1359 P2/P4). Build + 114 contract tests pass. Runtime wiring + verdict=inconclusive translation + defense-in-depth ReDoS guard on the live path land in the follow-up #1452.
@shaun0927 shaun0927 force-pushed the feat/1432-runtime-hook branch from 55af8f6 to 56161cd Compare May 28, 2026 16:36
@shaun0927 shaun0927 changed the base branch from feat/1432-assert-image-qa-clause to develop May 28, 2026 16:36
Completes the runtime wire-up promised in the original #1432-B brief
that was deferred from PR #1445. The DSL clause and evaluator landed
in Part 2, but EvalContext.imageQaSample was never populated, so every
image_qa contract evaluated to inconclusive in production.

Changes:
  - Extract the sampling forwarding from the image_qa tool handler
    into a reusable `runImageQaSampling(ctx, params)` export.
  - oc_assert.buildEvalContext now accepts the calling ToolContext
    and injects an imageQaSample closure that delegates to
    runImageQaSampling, threading the host's sampling capability +
    requestClient bridge straight through.
  - oc_assert handler signature accepts the optional context param.
  - The image_qa evaluator now stamps `details.error` (in addition to
    `details.reason`) on infra-fault paths so oc_assert's
    isInconclusive check translates them to verdict='inconclusive'
    instead of 'fail'. Runtime wiring problems are not contract
    failures.

SSOT (#1359) alignment: no server-side LLM ever — the closure only
forwards via sampling/createMessage, and falls back to
unsupported_by_host when the host lacks the capability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@shaun0927 shaun0927 force-pushed the feat/1432-runtime-hook branch from 56161cd to 52863aa Compare May 28, 2026 16:39
@shaun0927 shaun0927 merged commit 27a0a88 into develop May 28, 2026
10 checks passed
@shaun0927
Copy link
Copy Markdown
Owner Author

Analysis & merge summary (automated review)

Intent / direction. The runtime wire-up that activates the image_qa contract assertion from #1445. Correctly framed as a fix: without it, EvalContext.imageQaSample was never populated, so every image_qa contract evaluated to inconclusive in production. This injects the closure into oc_assert.buildEvalContext (forwarding to the host via runImageQaSampling → MCP sampling/createMessage) and stamps details.error on infra-fault paths so verdicts translate to inconclusive rather than fail.

#1359 (SSOT) alignment — consistent. The only outbound path is host-mediated MCP sampling — no fetch, no API key, no model SDK, no server-side judgment. When the client doesn't advertise sampling (or requestClient is absent), it returns unsupported_by_host and degrades deterministically to inconclusive. A transient sampling error maps to unsupported_by_hostinconclusive, never masking a genuine answer-vs-pattern mismatch (the regex path runs only on status:'ok'). Matches P2 (harness, not agent) and P4 (facts before decisions). Request shape (content:[{type:'image',data,mimeType},{type:'text'}], maxTokens) and response read (content.text) follow the MCP sampling/createMessage schema; 30s transport timeout, 512-token cap.

Fixes folded in during review (2 independent code-review passes):

  • P1 (ReDoS, raised on feat(contracts): add image_qa assertion to the outcome contract DSL #1445): the regex path becomes reachable here, so the raw new RegExp(expected_pattern) was swapped for compileSafeRegex(...) — defense-in-depth against catastrophic backtracking on the host's answer string. Semantics unchanged for safe patterns; rejections flow to the existing invalid_expected_pattern evidence.
  • P2 (this PR): a non-text sampling content block previously yielded answer = '', which a permissive pattern (./.*) would vacuously pass. Now guarded to return inconclusive instead, with a regression test.

Changes to reach merge-readiness. Rebased onto develop to contain only the runtime-hook commit (the image_qa tool #1441 and DSL clause #1445 are already on develop); retargeted base from the feature branch to develop.

Verification. tsc build clean; 43 tests pass (oc-assert-image-qa-runtime 5 incl. the new regression case, image-qa contract 9, image-qa tool 6, oc-assert wiring suites). CI: ubuntu ×3, windows ×3, agent-success-mock green; macOS jobs were queue-delayed on GitHub's runners (platform-neutral change, no OS-specific code).

Merged to develop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant