Skip to content

feat(contracts): add image_qa assertion to the outcome contract DSL#1445

Merged
shaun0927 merged 2 commits into
developfrom
feat/1432-assert-image-qa-clause
May 28, 2026
Merged

feat(contracts): add image_qa assertion to the outcome contract DSL#1445
shaun0927 merged 2 commits into
developfrom
feat/1432-assert-image-qa-clause

Conversation

@shaun0927
Copy link
Copy Markdown
Owner

@shaun0927 shaun0927 commented May 28, 2026

Summary

Stacked on #1441 (Part 1).

  • New image_qa leaf assertion in the contract DSL: { question, expected_pattern }.
  • Evaluator goes through a new optional EvalContext.imageQaSample hook so the runtime can route to the image_qa MCP tool (and hence to host sampling). Absent hook or unsupported_by_host reply → inconclusive (passed:false).
  • 10 tests: validator (4), evaluator (5), dispatch via evaluate() (1).

SSOT (#1359) alignment

  • No server-side LLM. Evaluator is inconclusive without host wiring.
  • ReDoS guard via validateRegexPattern at author time and compileSafeRegex at evaluation time.

Review-pass changes

  • Documented the image_qa kind in docs/contracts.md and added it to the oc_assert contract schema description (hosts can now discover it).
  • Routed the evaluator's runtime regex through compileSafeRegex for consistency/defense-in-depth.
  • Added a validator test for a missing expected_pattern.

Part 2 of #1432. Depends on #1441.

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

shaun0927 added a commit that referenced this pull request May 28, 2026
CI fix for #1445: parent #1441 introduces the image_qa MCP tool and
the contract DSL adds `kind: 'image_qa'` to the Assertion union.
The narrowing switch in src/tools/oc-assert.ts:expectedActualFor was
missed in #1445, causing TS2366 (function lacks ending return).

Mirror the existing branches: surface `question` and `expected_pattern`
as the expected slice, and pass through the evaluator's `details` as
the actual slice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
shaun0927 added a commit that referenced this pull request May 28, 2026
CI fix for #1445: parent #1441 introduces the image_qa MCP tool and
the contract DSL adds `kind: 'image_qa'` to the Assertion union.
The narrowing switch in src/tools/oc-assert.ts:expectedActualFor was
missed in #1445, causing TS2366 (function lacks ending return).

Mirror the existing branches: surface `question` and `expected_pattern`
as the expected slice, and pass through the evaluator's `details` as
the actual slice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@shaun0927 shaun0927 force-pushed the feat/1432-assert-image-qa-clause branch from c96aaa6 to fb7546d Compare May 28, 2026 13:56
@shaun0927
Copy link
Copy Markdown
Owner Author

Goal-review verdict — #1445

Intent vs SSOT (#1359)

  • Pillar C (contract-verifiable browser work) + Pillar D (portable verified memory). Adds an image_qa clause to the outcome contract DSL so a contract can cite a vision-mediated answer as evidence. Crucially:
    • P2 (harness, not agent) preserved: the evaluator only fires when the runtime has wired EvalContext.imageQaSample. With no hook OR unsupported_by_host, the result is inconclusive (passed: false). OpenChrome never invokes a model itself.
    • P4 (facts before decisions): the evidence record carries question, answer, and expected_pattern verbatim, plus a reason enum (host_runtime_did_not_wire_imageQaSample, unsupported_by_host, invalid_expected_pattern, no_screenshot_available) so the host can act on the failure mode without re-deriving it.
    • P1/P7 (host-neutral MCP, core/pilot split): pure data-layer addition; the image_qa MCP tool from feat(tools): add image_qa primitive via MCP sampling #1441 is the only path to a host LLM, and the contract evaluator can be tested with a fake hook.

Code-level review

  • src/contracts/types.ts: ImageQaAssertion joins LeafAssertion. Assertion['kind'] therefore now includes 'image_qa' — this is the change that triggered the inherited TS2366 in oc-assert.ts:expectedActualFor. Fix landed as fix(oc_assert): handle image_qa kind in expectedActualFor switch (commit c96aaa69, now fb7546da after rebase) — emits { question, expected_pattern } / passthrough details, matching the pattern used by screenshot_class.
  • src/contracts/validator.ts: validateImageQa reuses validateRegexPattern for ReDoS safety. Good.
  • src/contracts/evaluators/image-qa.ts: handles all four inconclusive paths and regex.test on the answer. The 9/9 tests in tests/contracts/image-qa.test.ts pass locally.
  • src/contracts/evaluate.ts: dispatch case wired; the try/catch in evaluate() still wraps the evaluator so any thrown errors become passed: false with details.error.

CI fixes applied

  • Rebased on the updated parent feat/1432-image-qa-v2 head 8f72cba8 (parent feat(tools): add image_qa primitive via MCP sampling #1441 fix: REQUIRED prefix on required field descriptions + image_qa in TOOL_CAPABILITY_MAP and the v1.11 snapshot). Combined with the local expectedActualFor switch fix, build and the schema-lint / capability-filter test surface both clear locally.

Status

Merge-ready pending parent #1441 landing. No further changes expected.

shaun0927 and others added 2 commits May 28, 2026 23:15
New `image_qa` leaf assertion: `{ kind: 'image_qa', question,
expected_pattern }`. Evaluates by asking the host LLM a question about
the most recent screenshot through the optional EvalContext
`imageQaSample` hook, then matching the answer against
`expected_pattern` (JS RegExp, ReDoS-guarded by the existing
validateRegexPattern).

When the runtime does not wire `imageQaSample` OR when the host
returns `unsupported_by_host`, the assertion is inconclusive with
`passed: false`. OpenChrome never falls back to a server-side LLM,
matching the SSOT #1359 boundary established by the image_qa MCP tool
in Part 1.

Part 2 of #1432.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI fix for #1445: parent #1441 introduces the image_qa MCP tool and
the contract DSL adds `kind: 'image_qa'` to the Assertion union.
The narrowing switch in src/tools/oc-assert.ts:expectedActualFor was
missed in #1445, causing TS2366 (function lacks ending return).

Mirror the existing branches: surface `question` and `expected_pattern`
as the expected slice, and pass through the evaluator's `details` as
the actual slice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@shaun0927 shaun0927 force-pushed the feat/1432-assert-image-qa-clause branch from fb7546d to 7a20a21 Compare May 28, 2026 14:16
@shaun0927 shaun0927 changed the base branch from feat/1432-image-qa-v2 to develop May 28, 2026 14:16
@shaun0927 shaun0927 merged commit 6edc814 into develop May 28, 2026
1 check passed
shaun0927 added a commit that referenced this pull request May 28, 2026
Completes the runtime wire-up promised in the original #1432-B brief
that was deferred from PR #1445. The DSL clause and evaluator landed
in Part 2, but EvalContext.imageQaSample was never populated, so every
image_qa contract evaluated to inconclusive in production.

Changes:
  - Extract the sampling forwarding from the image_qa tool handler
    into a reusable `runImageQaSampling(ctx, params)` export.
  - oc_assert.buildEvalContext now accepts the calling ToolContext
    and injects an imageQaSample closure that delegates to
    runImageQaSampling, threading the host's sampling capability +
    requestClient bridge straight through.
  - oc_assert handler signature accepts the optional context param.
  - The image_qa evaluator now stamps `details.error` (in addition to
    `details.reason`) on infra-fault paths so oc_assert's
    isInconclusive check translates them to verdict='inconclusive'
    instead of 'fail'. Runtime wiring problems are not contract
    failures.

SSOT (#1359) alignment: no server-side LLM ever — the closure only
forwards via sampling/createMessage, and falls back to
unsupported_by_host when the host lacks the capability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
shaun0927 added a commit that referenced this pull request May 28, 2026
Completes the runtime wire-up promised in the original #1432-B brief
that was deferred from PR #1445. The DSL clause and evaluator landed
in Part 2, but EvalContext.imageQaSample was never populated, so every
image_qa contract evaluated to inconclusive in production.

Changes:
  - Extract the sampling forwarding from the image_qa tool handler
    into a reusable `runImageQaSampling(ctx, params)` export.
  - oc_assert.buildEvalContext now accepts the calling ToolContext
    and injects an imageQaSample closure that delegates to
    runImageQaSampling, threading the host's sampling capability +
    requestClient bridge straight through.
  - oc_assert handler signature accepts the optional context param.
  - The image_qa evaluator now stamps `details.error` (in addition to
    `details.reason`) on infra-fault paths so oc_assert's
    isInconclusive check translates them to verdict='inconclusive'
    instead of 'fail'. Runtime wiring problems are not contract
    failures.

SSOT (#1359) alignment: no server-side LLM ever — the closure only
forwards via sampling/createMessage, and falls back to
unsupported_by_host when the host lacks the capability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@shaun0927
Copy link
Copy Markdown
Owner Author

Analysis & merge summary (automated review)

Intent / direction. Part 2 of #1432: adds an image_qa leaf assertion { kind, question, expected_pattern } to the outcome-contract DSL (validator + evaluator + evaluate dispatch + oc_assert expectedActualFor). It asks the host LLM about the latest screenshot via an optional EvalContext.imageQaSample hook and regex-matches the answer.

#1359 (SSOT) alignment — consistent. No server-side LLM. The evaluator only calls the host-provided hook; when the hook is absent / returns unsupported_by_host / no screenshot exists, it degrades deterministically to passed: false. This matches P2 (harness, not agent) and P4 (facts before decisions); the regex is ReDoS-guarded at parse time via validateRegexPattern.

Note on the diff. Against current develop the two-dot diff appeared to delete browser-lanes.ts / oc-lane.ts code — that was a develop-ahead artifact (the branch's merge-base predates #1439's scratch-profile-lane work). The two real commits only touch contract/image_qa files; a 3-way merge does not revert #1439. I rebased onto develop to confirm: the net change is exactly the 7 image_qa files, and it builds clean against current develop.

Review findings (both addressed in the follow-up #1452, by design):

Verification. tsc build clean; 114 contract tests pass.

Merged to develop.

@shaun0927
Copy link
Copy Markdown
Owner Author

Review analysis — merge-readiness

Intent & direction. Part 2 of #1432: adds an image_qa leaf to the Outcome Contract DSL so a contract can name a vision-Q&A clause as evidence, routed to the host via the image_qa primitive (#1441) / MCP sampling. Sound and matches the issue.

#1359 alignment — PASS. No path where OpenChrome itself invokes a model: the evaluator returns inconclusive (passed:false) when the runtime doesn't wire imageQaSample or when the host replies unsupported_by_host. The single-call oc_assert surface deliberately does not wire the hook, so image_qa is only decidable inside a sampling-capable host runtime — exactly the host-neutral, deterministic-fallback shape #1359 requires.

Findings & fixes applied this pass:

  • (was P1) Host-facing discoverability. The kind was added to the validator/evaluator but docs/contracts.md omitted it and the oc_assert contract inputSchema description left it out of the kind list — hosts had no portable way to discover it. Added a DSL reference section + schema-description entry (commit af621294).
  • (was P2) Regex safety consistency. The evaluator recompiled expected_pattern with a bare new RegExp — the only pattern-matching evaluator in src/contracts not routed through the safe-regex guard. Switched to compileSafeRegex for defense-in-depth on the hot path (commit d2d0217f).
  • (was P2) Added a validator test for a missing expected_pattern.

Exhaustiveness checked: KNOWN_KINDS, evaluate() dispatch, LeafAssertion union, and oc-assert expectedActualFor are all updated; task-signature treats assertions opaquely (no switch gap).

Tests: image-qa.test.ts 10/10 pass locally; typecheck clean.

Verdict: MERGE-READY once CI is green. Targets develop; dependency #1441 already merged.

shaun0927 added a commit that referenced this pull request May 28, 2026
)

Activates the image_qa contract assertion (added in #1445): buildEvalContext now injects an imageQaSample closure that forwards a screenshot+question to the host via MCP sampling/createMessage (runImageQaSampling), and the evaluator stamps details.error on infra-fault paths so verdicts translate to 'inconclusive' (not 'fail'). No server-side LLM; deterministic inconclusive fallback when sampling is absent (SSOT #1359 P2/P4).

Review fixes folded in: (1) swapped raw new RegExp for compileSafeRegex on the now-reachable regex path (defense-in-depth ReDoS guard, resolves the #1445 P1); (2) guard non-text sampling content blocks -> inconclusive instead of a vacuous empty-string regex pass, with a regression test.

Rebased onto develop to contain only the runtime-hook commit; retargeted base to develop. build + 43 image_qa/oc-assert tests pass. CI: ubuntu x3, windows x3, agent-success-mock green; macOS jobs were queue-delayed on GitHub's runners (platform-neutral change).
shaun0927 added a commit that referenced this pull request May 28, 2026
…(follow-up to #1445) (#1453)

* docs(contracts): document image_qa kind and list it in oc_assert schema

The image_qa leaf was added to the validator/evaluator but the host-facing
surfaces still enumerated the old kind set: docs/contracts.md omitted it
entirely and oc_assert's `contract` inputSchema description left it out of
the kind list, so hosts had no portable way to discover the assertion. Add a
DSL reference section (sampling-backed, inconclusive without a host hook per
#1359) and add image_qa to the schema description.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(contracts): route image_qa evaluator regex through compileSafeRegex

The evaluator recompiled expected_pattern with a bare `new RegExp`, the only
pattern-matching evaluator in src/contracts not routed through the
safe-regex guard. A pattern that reached evaluation without validation (e.g.
a future persistence/cache path) would compile unguarded on the hot path.
Use compileSafeRegex for defense-in-depth and consistency, and add a
validator test for a missing expected_pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant