Skip to content

Route Codex Responses through local converter#62

Merged
danielwpz merged 4 commits into
mainfrom
fix/codex-responses-local-converter
May 29, 2026
Merged

Route Codex Responses through local converter#62
danielwpz merged 4 commits into
mainfrom
fix/codex-responses-local-converter

Conversation

@danielwpz
Copy link
Copy Markdown
Owner

@danielwpz danielwpz commented May 29, 2026

Problem

This PR started from a production failure in the openai-codex-responses path:

  • A scheduled task failed with Duplicate item found with id msg_1.
  • The failed request was not going through Pokoclaw's local Responses converter.
  • Instead, the Codex path was still using the bundled @mariozechner/pi-ai Codex converter, so cross-provider assistant history could still emit duplicate fallback response item ids.

The first fix routed Codex Responses through the local converter, which addressed the production duplicate-id failure. Review then exposed a larger design issue: the implementation had mixed Codex-specific backend behavior into the generic OpenAI upstream adapter. That made the code harder to reason about and left several edge cases fragile.

Root Cause

There are two different layers that looked similar but have different contracts:

  • Generic OpenAI-compatible Responses providers, which should stay inside src/agent/llm/upstream-openai.ts.
  • ChatGPT/Codex Responses transport, which uses a Codex-specific base URL, headers, credential shape, request parameters, service-tier semantics, and stream event names such as response.done.

The buggy version treated Codex as a small branch inside the generic OpenAI Responses path. That was the wrong boundary. It forced upstream-openai.ts to know about chatgpt-account-id, Codex account extraction, Codex reasoning effort clamping, Codex service-tier pricing, and Codex stream normalization. It also meant review fixes could easily become scattered conditionals rather than a coherent provider adapter.

Approach

This PR now routes openai-codex-responses through a dedicated Codex adapter in src/agent/llm/providers/codex/stream.ts.

The adapter owns the Codex-specific behavior end to end:

  • builds the Codex Responses request body with the local convertResponsesMessages converter
  • forwards max_output_tokens from the configured model budget
  • only sends tools, tool_choice, and parallel_tool_calls when visible tools are actually present
  • keeps Codex-only include, text, instructions, reasoning summary, and reasoning effort behavior in one place
  • builds ChatGPT/Codex headers including chatgpt-account-id
  • accepts a resolver-provided Codex account id before falling back to JWT claim extraction
  • normalizes Codex response.done / response.incomplete / response.completed events for the shared Responses stream processor
  • requires a known status on Codex response.done so malformed terminal events cannot silently become success
  • maps failed or cancelled terminal responses to response.failed so upstream error details are preserved
  • validates critical Codex stream event shapes before handing them to processResponsesStream
  • requires finalized message content and reasoning summary on response.output_item.done because the shared processor replaces the streamed block with the finalized item
  • validates shared fields such as string item.id even for unknown Codex output item types
  • preserves Codex service-tier pricing behavior, including priority/flex handling

src/agent/llm/upstream-openai.ts is now back to the generic OpenAI-compatible boundary. It dispatches Codex to the Codex adapter and no longer contains Codex-specific header, event, or request-shape logic.

The generic OpenAI Responses path also now keeps the requested service tier available through the stream processor and final usage normalization. If the upstream response reports default, omits service_tier, or otherwise lacks the effective requested tier, estimated pricing still uses the requested priority/flex tier that was actually sent.

Credential Design

codex-local credentials can store the account id separately from the access token. The previous implementation assumed every bearer token could be decoded as a JWT with a chatgpt_account_id claim. That is not a stable contract.

The resolver now returns a structured credential for Codex-local auth:

  • apiKey: the access token
  • accountId: the stored Codex account id when available

PiBridge forwards that account id through stream options, and the Codex adapter uses it for chatgpt-account-id. JWT extraction remains only as a fallback for configured API keys that still carry the claim.

Streaming And Non-Streaming Design

The production failure surfaced in streaming task execution, but leaving completeTurn / completeText on the bundled pi-ai Codex path would preserve a second, divergent Codex converter. That would be a latent bug.

For openai-codex-responses, non-streaming bridge calls now drain the same local streaming adapter and consume the final AssistantMessage. That keeps Codex conversion, request construction, and event processing on one path across task runs, normal chat turns, and compaction-style calls. Error-path coverage now includes the non-streaming bridge path as well, so a failed Codex terminal event must reject instead of resolving as a generic message.

Service-Tier Ownership

Service-tier handling now has explicit ownership:

  • Custom Responses adapters read options.serviceTier and set service_tier directly in the request params.
  • The Codex adapter applies Codex-specific pricing and does not need the generic payload mutation hook.
  • The legacy onPayload hook remains only as compatibility for streamSimple adapters that do not know about Pokoclaw's options.serviceTier extension. Its comment now documents that purpose.

This keeps the runtime contract visible without removing compatibility for adapters still delegated to pi-ai.

Why This Shape

The main architectural choice is to treat Codex as a provider-specific adapter, not as a feature flag inside generic OpenAI Responses.

That boundary is important because:

  • provider-specific secrets and headers stay in the provider layer
  • generic OpenAI-compatible code does not learn ChatGPT/Codex private protocol details
  • request-shape decisions are testable without conditional sprawl in shared code
  • stream event validation happens at the adapter boundary, before shared processing sees a typed ResponseStreamEvent
  • future Codex protocol changes have one owner module instead of scattered branches
  • both streaming and non-streaming bridge calls share the same Codex conversion path
  • common Responses input normalization lives in the shared Responses helper module instead of being duplicated per adapter

This is not only fixing the duplicate id symptom. It removes the design condition that let this class of bug survive in one path while being fixed in another.

Review Focus

Please review this PR at the architecture boundary level, not only at individual comment details:

  • whether providers/codex/stream.ts is the right ownership boundary for Codex transport behavior
  • whether upstream-openai.ts is now appropriately generic again
  • whether credential flow from CodexProviderApiKeyResolver through PiBridge into the adapter is explicit enough
  • whether the event validation boundary is strict enough without overfitting current Codex telemetry
  • whether routing non-streaming Codex calls through the streaming adapter is the right tradeoff for eliminating divergent converter paths
  • whether service-tier ownership is clear enough between custom adapters and streamSimple compatibility

Tests

Local verification:

  • pnpm preflight
  • Result: 154 test files passed, 1013 tests passed

Regression coverage added or updated:

  • duplicate fallback response item ids in cross-provider assistant history
  • non-streaming Codex bridge calls using the local Codex converter
  • non-streaming Codex failed terminal events rejecting with upstream details
  • resolver-provided Codex account id with an opaque bearer token
  • failed Codex response.done preserving upstream error details
  • Codex response.done requiring a known status
  • malformed critical Codex stream events failing at the adapter boundary
  • finalized Codex output item validation for message content and reasoning summary
  • unknown Codex output item validation for shared item id fields
  • representative Codex stream event normalization
  • post-terminal event consumption no longer returning early
  • Codex request body forwarding max_output_tokens
  • Codex request body omitting tool-choice fields when no tools are present
  • generic OpenAI Responses estimated pricing preserving the requested service tier when upstream reports default

Production Validation

Before this broader cleanup, the branch was temporarily deployed to the SSH host and the user confirmed the runtime test passed for the original scheduled-task failure. This PR now keeps that production fix while tightening the architecture and regression coverage around it.

@claude
Copy link
Copy Markdown

claude Bot commented May 29, 2026

PR Review: Route Codex Responses through local converter

1. mapCodexResponsesEvents silently drops post-terminal events

The generator in mapCodexResponsesEvents (line 818) calls return immediately after yielding the normalized terminal event (response.done / response.incomplete / response.completed). If the Codex API ever emits events after the terminal event (telemetry, trailing metadata, etc.), they would be silently discarded with no warning. Practically this is likely fine since terminal events should be last, but the silent discard makes diagnostics harder if the API behavior changes.

Fix direction: At minimum, log a warning if more events arrive after the terminal event. Alternatively, let the loop finish naturally and only map the terminal event type — the stream will end when the upstream source is exhausted.

2. Codex event stream bypasses type safety end-to-end

mapCodexResponsesEvents casts every intermediate event as unknown as ResponseStreamEvent (line 847). processResponsesStream in pi-ai-openai-responses-shared.ts accesses fields like event.item, event.part, event.delta, event.response on that unverified type. If the Codex API's event shapes diverge from the standard OpenAI Responses API shapes that processResponsesStream expects, the mismatch would cause silent breakage (fields read as undefined, blocks skipped) rather than a clear error. The production validation covers the current API shape, but any upstream change on the Codex side will be caught only by user-visible breakage.

Fix direction: Add a Zod/discriminated-union validation at the boundary (inside mapCodexResponsesEvents) for the critical event types that processResponsesStream depends on: response.output_item.added, response.content_part.added, response.output_text.delta, response.output_item.done, and response.completed. This would turn silent breakage into a clear error message.

3. parallel_tool_calls: true and tool_choice: "auto" are set unconditionally for Codex

The diff adds tool_choice: "auto" and parallel_tool_calls: true to the Codex params literal (line 681-682), even when context.tools is empty. These parameters are harmless without tools but may cause unexpected behavior if the Codex API interprets tool_choice: "auto" differently than omitting the field entirely. The standard Responses API behavior is to not send tool_choice unless tools are present.

Fix direction: Guard these behind a context.tools check, consistent with how params.tools is conditionally set below.

4. Test coverage gap: event mapping not exercised

The test file exercises the HTTP request shape (service tier, duplicate IDs) but does not test mapCodexResponsesEvents at all — there are no tests that feed Codex-shaped SSE events into the stream and verify correct normalization. The event mapping is the riskiest part of this change since it involves type-level assumptions about the Codex API. Production validation covers the current state but provides no regression safety net.

Fix direction: Add a unit test for mapCodexResponsesEvents that streams a representative sequence of Codex events (at minimum: response.output_item.added, response.output_text.delta, response.output_item.done, and response.done) and asserts the normalized output matches what processResponsesStream expects.

5. Redundant include array renders the reasoning block conditional a no-op for Codex

The Codex initial params include include: ["reasoning.encrypted_content"] (line 679). Later, the reasoning block (line 716) guards params.include = ["reasoning.encrypted_content"] behind model.api !== "openai-codex-responses". So for Codex, include is already set and the reasoning block's guard correctly prevents a duplicate assignment. This works, but the two assignment sites (line 679 and line 717) create a non-obvious invariant — someone reading only the reasoning block might think include is conditionally set, not realizing it has already been set unconditionally for Codex above.

Fix direction: Collapse into a single assignment point. Set include only in the reasoning block for non-Codex, and only in the initial params for Codex, so the two paths don't overlap.

Summary

The change is functionally correct as validated in production. The main risks are around fragility of the Codex event mapping (no type validation, no test coverage) and a few clarity issues in the params construction. None are blocking for merge, but items 1-4 represent real operational risk if the Codex API changes.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ee7fd12aa0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agent/llm/upstream-openai.ts Outdated
): Record<string, string> {
const headers: Record<string, string> = {
"OpenAI-Beta": "responses=experimental",
"chatgpt-account-id": extractCodexAccountId(token),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Allow Codex account IDs stored outside the token

When codex-local returns a credential whose accessToken is opaque or simply lacks the chatgpt_account_id JWT claim, this header construction throws before any request is sent even though the Codex credential path already preserves accountId separately from auth.json/refreshOpenAICodexToken. Pass that stored account ID through the resolver/stream options, or fall back to it here, instead of requiring every bearer token to be self-decodable.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1069300.

CodexProviderApiKeyResolver now returns { apiKey, accountId } for codex-local, PiBridge forwards codexAccountId, and the Codex adapter uses that value before falling back to JWT claim extraction. Added regression coverage for an opaque bearer token with resolver-provided account id.

pnpm preflight passes locally.

Comment thread src/agent/llm/upstream-openai.ts Outdated
: undefined,
}
: event.response;
yield { ...event, type: "response.completed", response } as unknown as ResponseStreamEvent;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve failed response details when normalizing Codex done events

If the Codex stream ends with response.done whose response.status is failed or cancelled, this remapping turns it into response.completed, so processResponsesStream skips its response.failed branch and the outer stream later throws the generic An unknown error occurred instead of the upstream error/incomplete details. Map failed/cancelled final statuses to the failure path, or throw with the embedded details here, so production failures remain diagnosable and retry classification can use the real error.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1069300.

Codex terminal events now normalize response.done with failed or cancelled status to response.failed, so the shared Responses processor preserves upstream error and incomplete_details instead of collapsing to a generic unknown error. Added regression coverage for the failed response.done path.

pnpm preflight passes locally.

Comment thread src/agent/llm/upstream-openai.ts Outdated
Comment on lines +681 to +682
tool_choice: "auto",
parallel_tool_calls: true,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Forward the Codex output token cap

For openai-codex-responses models, the new local request shape never includes max_output_tokens, so any maxOutputTokens configured on the model is ignored on exactly the Codex path this commit reroutes; a low cap intended to control cost or latency will no longer be enforced while the non-Codex Responses branch still forwards model.maxTokens. Include the configured token budget in the Codex params, or map it to the Codex backend's equivalent field if it differs.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1069300.

The Codex adapter now forwards max_output_tokens: model.maxTokens in buildOpenAICodexResponsesParams, and the request-shape regression test asserts the configured model budget is sent on the Codex path.

pnpm preflight passes locally.

@danielwpz
Copy link
Copy Markdown
Owner Author

@codex review
@claude review

Please review the current head 1069300 again, but start from the architecture and provider-boundary design rather than only the previous line-level comments.

The intended design is now:

  • src/agent/llm/upstream-openai.ts remains the generic OpenAI-compatible adapter.
  • src/agent/llm/providers/codex/stream.ts owns ChatGPT/Codex-specific transport behavior: request shape, headers, account id handling, service-tier behavior, and Codex SSE event normalization.
  • CodexProviderApiKeyResolver can return structured credentials { apiKey, accountId }; PiBridge forwards that account id explicitly into the Codex adapter.
  • openai-codex-responses non-streaming bridge calls drain the same local streaming adapter so Codex conversion does not diverge between streaming and complete paths.
  • Critical Codex stream events are validated at the adapter boundary before they are passed to the shared Responses stream processor.

Please focus the review on whether this boundary is right for Pokoclaw long-term:

  • Is Codex-specific protocol knowledge sufficiently isolated from the generic OpenAI-compatible path?
  • Is the credential flow explicit enough and safe enough for codex-local and configured API-key cases?
  • Is routing non-streaming Codex calls through the same adapter the right tradeoff to eliminate duplicate converter paths?
  • Is the adapter validation strict enough to catch upstream shape drift without overfitting incidental telemetry?
  • Are there any remaining hidden coupling points between runtime bridge logic, provider auth, and shared Responses processing?

The PR description has been rewritten with the root cause, design rationale, and validation details. Local pnpm preflight passes, and CI on 1069300 is green.

@claude
Copy link
Copy Markdown

claude Bot commented May 29, 2026

Architecture Review of 1069300

The boundary design is sound. Codex protocol knowledge is well-isolated in providers/codex/stream.ts, the credential flow is explicit, and routing non-streaming calls through the streaming adapter eliminates the dual-converter problem that created the original bug. Below are the merge-relevant findings, ordered by priority.


1. Non-streaming Codex error path is untested

consumeStreamToCompletion drains the stream iterator and calls stream.result(). The streaming error tests (failed response.done, malformed events) confirm that stream.result() rejects when the adapter pushes an error event, so consumeStreamToCompletion should also throw. But there is no test for completeTurn (or completeText) with a Codex error response. If pi-ai's stream.result() ever changes to resolve instead of reject for error-terminal streams, the non-streaming catch block would be bypassed and enrichCodexFetchFailure would not run.

Fix direction: Add a test case for bridge.completeTurn where the Codex SSE response contains a response.done with status "failed", asserting the call rejects (matching the existing streaming test at pi-bridge.codex-service-tier.test.ts:1104).

2. validateCodexOutputItemEvent returns void for unknown item types — silently passes unvalidated data

In mapCodexResponsesEvents, the "response.output_item.added" / "response.output_item.done" cases call validateCodexOutputItemEvent, which validates "message", "function_call", and "reasoning" item types, but silently returns for any other item type without even checking that item.id is a string. If Codex upstream introduces a new item type and ships a malformed event (e.g. id is a number), it passes straight to processResponsesStream.

Fix direction: At minimum, validate that unknown item types have a string id field. Better: add an explicit else branch that validates the subset of fields shared across all item types (id).

3. getRequestServiceTier casts SimpleStreamOptions to access a non-declared field

function getRequestServiceTier(
  options?: SimpleStreamOptions,
): ResponseCreateParamsStreaming["service_tier"] | undefined {
  return (options as { serviceTier?: ... } | undefined)?.serviceTier;
}

serviceTier is not on pi-ai's SimpleStreamOptions type. The field is reliably set at runtime by buildPiStreamOptions, so the cast is safe, but it leaves no type-level trace that upstream-openai.ts depends on this implicit contract. The same pattern appears in applyPayloadHook (casting options.onPayload).

Fix direction: Define a local ExtendedStreamOptions type alias (or reuse the Codex adapter's CodexResponsesStreamOptions) rather than casting, so the dependency is visible in the type system. At minimum, add a comment noting that the fields are set by buildPiStreamOptions.

4. normalizeResponsesInputRoles and isRecord are duplicated

Both upstream-openai.ts and providers/codex/stream.ts define identical copies of normalizeResponsesInputRoles and isRecord. This is a small amount of code, but it means the "developer → system" normalization rule has two homes. If the rule ever changes, one copy will be missed.

Fix direction: Move the shared helpers into pi-ai-openai-responses-shared.ts (which already vends convertResponsesMessages, convertResponsesTools, and processResponsesStream).

5. onPayload service-tier hook is now partially vestigial for the Responses paths

Before this PR, service_tier reached the request body only through the onPayload mutation hook in buildPiStreamOptions. After this PR, both buildOpenAICodexResponsesParams and buildOpenAIResponsesParams read options.serviceTier and set params.service_tier directly. The onPayload hook still runs but becomes a no-op for the Responses paths (since createServiceTierPayloadPatch skips if service_tier is already set). The hook is still needed for the Completions path via streamSimple, but the dual mechanism creates confusion about which layer "owns" service-tier injection.

This is not a bug — the two mechanisms compose safely — but it adds a moving part that a future maintainer might misinterpret.

Fix direction: Either remove the onPayload hook for the Responses paths (since params builders now set service_tier natively) or document that the hook is a cross-cutting safety net for adapters that don't read options.serviceTier.


What's solid

  • The response.doneresponse.failed mapping for Codex terminal error events preserves upstream error details and routes them through the shared error path correctly.
  • JWT account-id extraction is a clean fallback behind the resolver-provided codexAccountId, and an opaque token with no claim throws a clear error rather than silently omitting the header.
  • The event validation catches malformed delta strings, missing item records, and wrong-typed content arrays — the right things to validate without overfitting to incidental telemetry fields.
  • The CODEX_TOOL_CALL_PROVIDERS set is explicitly passed to convertResponsesMessages, so cross-provider assistant history gets correct fallback item ids independent of the generic OpenAI set.
  • The clampCodexReasoningEffort function correctly gates per-model effort clamping at the provider boundary.
  • Regression coverage is thorough: duplicate item ids, non-streaming bridge path, resolver-provided account id, failed terminal events, malformed events, post-terminal event consumption, and max_output_tokens.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1069300c18

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agent/llm/providers/codex/stream.ts Outdated
: type === "response.incomplete"
? "incomplete"
: undefined;
const status = normalizeCodexResponseStatus(response.status, type) ?? fallbackStatus;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require status on Codex done events

When a Codex terminal response.done event omits response.status due to upstream shape drift, this expression leaves status undefined and the adapter still emits response.completed; the shared processor then maps the missing status to a successful stop, so a malformed terminal event can silently complete instead of failing at the Codex boundary. Since response.done is the critical event that decides success vs failure, require a known status for that event rather than falling through with undefined.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 12e6010.

response.done now requires a known Codex response.status; a missing status fails at the adapter boundary instead of being normalized to a successful completion. Added a regression test in providers.codex-stream.test.ts for the missing-status terminal event.

pnpm preflight passes locally.

yield normalizeCodexTerminalEvent(event, type);
continue;
case "response.output_item.added":
case "response.output_item.done":
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate finalized Codex output items

Because response.output_item.done reuses the same loose validator as added, a done event for a message or reasoning item can omit the finalized content/summary and still reach the shared processor; that processor overwrites the accumulated streamed block from the done item (item.content ?? [] / item.summary ?? []), so a Codex shape change that sends deltas but not final content would silently erase the generated text or reasoning. Keep the permissive shape for added, but require the finalized fields on output_item.done before passing it on.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 12e6010.

The Codex adapter now keeps added permissive but treats response.output_item.done as finalized: message items must include content, reasoning items must include summary, and those arrays are validated before they reach the shared Responses processor. This prevents streamed content from being overwritten by an incomplete final item. Added focused regression tests for both message and reasoning done events.

pnpm preflight passes locally.

Comment thread src/agent/llm/upstream-openai.ts Outdated
Comment on lines +674 to +676
const requestServiceTier = getRequestServiceTier(options);
if (requestServiceTier) {
params.service_tier = requestServiceTier;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve requested tier for Responses pricing

When a normal openai-responses model is configured with serviceTier = "fast" or "flex", this now sends the requested service_tier, but the streaming path still only passes completedServiceTier into processResponsesStream; if the response reports default/null or omits service_tier, the shared processor's existing request-tier fallback receives undefined, so usage cost is recorded at base price even though the request used priority/flex. Keep the requested tier and merge it the same way the Codex adapter does before applying service-tier pricing.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 12e6010.

The generic openai-responses stream path now keeps the request service tier separately from the response tier and resolves default/missing response tiers back to the requested tier before applying estimated pricing. Added a regression test where the request uses priority but the upstream completion reports service_tier: "default".

pnpm preflight passes locally.

@danielwpz
Copy link
Copy Markdown
Owner Author

@codex review
@claude review

Please review the latest head 12e6010 again.

I handled the new inline and overall comments with the same TDD flow: added failing regression coverage first, fixed the provider/bridge code, then ran full local pnpm preflight.

What changed since 1069300:

  • Codex response.done now requires a known response.status; missing status fails at the adapter boundary instead of silently becoming a successful completion.
  • Codex response.output_item.done now requires finalized message content and reasoning summary, because the shared Responses processor replaces streamed blocks from the finalized item.
  • Unknown Codex output item types still pass through for forward compatibility, but now validate shared fields, at minimum string item.id.
  • Non-streaming Codex bridge turns now have explicit failed-terminal-event coverage, matching the streaming error path.
  • Generic openai-responses estimated pricing now preserves the requested service tier when upstream reports default or omits service_tier.
  • upstream-openai.ts now uses a typed OpenAIResponsesAdapterOptions contract instead of hidden casts for serviceTier / onPayload.
  • normalizeResponsesInputRoles and isRecord moved into the shared Responses helper so the developer-to-system invariant has one home.
  • The service-tier payload hook is no longer attached to Codex; it is documented as compatibility for streamSimple adapters that do not read Pokoclaw options.serviceTier.

I also updated the PR description to include the latest root-cause/design notes, service-tier ownership, and the new regression coverage.

Local verification on 12e6010:

  • pnpm preflight
  • Result: 154 test files passed, 1013 tests passed

For this pass, please re-check both the specific fixes and the architecture boundary: Codex protocol behavior should stay in providers/codex/stream.ts, generic OpenAI Responses behavior should remain in upstream-openai.ts, and the bridge should only carry explicit runtime options between those layers.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@danielwpz
Copy link
Copy Markdown
Owner Author

@claude review please

@claude
Copy link
Copy Markdown

claude Bot commented May 29, 2026

Review

Overall: Clean, well-tested PR. No blocking issues. The architecture boundary is correctly drawn — Codex transport is now isolated in providers/codex/stream.ts, the generic OpenAI adapter is appropriately generic again, and the credential flow is explicit end-to-end. Test coverage is thorough across the scenarios described in the PR body.

1. isRecord semantics changed during the move (low risk, actually a fix)

In upstream-openai.ts, the original isRecord was:

return typeof value === "object" && value !== null;

In the shared module, it's now:

return typeof value === "object" && value !== null && !Array.isArray(value);

This is a correction — arrays should not have been treated as records. The existing callers already guard with array checks before calling isRecord, so there's no behavioral impact. Worth being aware of if it's used in future callers that expect the old loose behavior.

2. completeTurn / completeText: No explicit timeout difference for non-streaming Codex calls

Non-streaming Codex calls now drain a full SSE stream via consumeStreamToCompletion. The streaming adapter path includes the LlmStreamWatchdog timeout machinery, while completeSimple (the pi-ai path used for other providers) may not have the same watchdog behavior. This means non-streaming Codex calls now get stream-level timeouts that other non-streaming providers don't. Is that intentional? It seems reasonable for Codex, but worth confirming.

3. Service tier double-application is avoided, but the flow is subtle

Inside streamOpenAIResponsesWithUpstreamUsage, the processResponsesStream call applies service tier pricing during the response.completed event (via calculateCost + applyServiceTierPricing). Then the post-processing block replaces output.usage entirely via normalizeUsageFromOpenAICompatible and applies service tier pricing a second time only when costSource === "estimated". The first application is always overwritten, so there's no double-counting. But the two-step flow (calculate → replace → re-apply-if-estimated) is easy to misread. A brief comment above the post-processing block noting that output.usage is intentionally replaced would help future maintainers.

4. Test coverage is comprehensive

All scenarios from the PR description have corresponding tests:

  • Duplicate fallback item IDs in cross-provider history ✓
  • Non-streaming Codex bridge calls through local converter ✓
  • Codex response.done requiring known status ✓
  • Failed terminal events preserving upstream error details ✓
  • Resolver-provided account ID with opaque bearer tokens ✓
  • Malformed stream events rejected at adapter boundary ✓
  • Generic OpenAI estimated pricing preserving requested service tier ✓

Verdict

Ready to merge. The design addresses the root cause (divergent converter paths) rather than the symptom (duplicate item IDs), and the test coverage locks in the fix across both streaming and non-streaming paths.

@danielwpz
Copy link
Copy Markdown
Owner Author

@claude Addressed your follow-up notes in 00b87fc.

  • Confirmed the non-streaming Codex timeout behavior: completeTurn / completeText reuse the Codex streaming adapter only for converter/event-processing parity. They do not install LlmStreamWatchdog; they still rely on the caller-provided AbortSignal, matching the non-streaming execution model. I added a comment above consumeStreamToCompletion so this boundary is explicit.
  • Added a comment in streamOpenAIResponsesWithUpstreamUsage explaining that the upstream-compatible usage payload intentionally replaces usage calculated by the shared stream parser before service-tier pricing is re-applied for estimated costs.

No runtime behavior changed. Local pnpm preflight passes: 154 test files, 1013 tests.

Please review again.

@danielwpz danielwpz merged commit 5722347 into main May 29, 2026
3 checks passed
@danielwpz danielwpz deleted the fix/codex-responses-local-converter branch May 29, 2026 05:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant