F085: Unified Display-Event Abstraction Across All Agent Providers
Scope
In Scope
- Introduction of a
DisplayEventSource interface contract replacing the LineExtractor function-field hook on cliProviderHooks.
- Definition of a discriminated
DisplayEvent type with at least EventText and EventToolUse kinds.
- Implementation of
DisplayEventSource for all 5 providers: Claude, Codex, Gemini, OpenCode (streaming), and OpenAICompatible (post-response translation).
- Single interfaces-layer renderer with default (text only) and verbose (text + tool markers) display modes.
- Preservation of
DisplayOutput string aggregation on AgentResult/ConversationResult by aggregating text events.
- Placement of
DisplayEvent type in a dedicated infrastructure-adjacent package (location decided during implementation and documented).
Out of Scope
EventReasoning and other extended event kinds (thinking blocks, rate-limit notices, cache-hit markers, agent hand-offs) — enum must accommodate but emission is deferred.
- Real-usage parsing via
UsageSource interface (observation 01) — tracked as a parallel refactor that can ship independently.
- Renderer theming, colour customisation, or user-configurable marker formats.
- Persistence or replay of display events across sessions.
Deferred
| Item |
Rationale |
Follow-up |
EventReasoning emission for Claude thinking / OpenCode reasoning / Gemini thoughts |
Requires opt-in UX design; out of scope to keep this feature focused on tool-use parity |
future |
| Tool-call id correlation (request → response) across turns |
Needs a correlation model that spans conversation state; not required for the rendering surface this feature ships |
future |
UsageSource interface contract for per-turn token accounting |
Parallel refactor with independent test surface and lifecycle; ships separately |
future (observation 01) |
| Per-user verbose-mode configuration (CLI flag or workflow field) |
Renderer supports modes internally; user-facing toggle is a follow-up once default shape is proven |
future |
User Stories
US1: Consistent Tool Activity Across Providers (P1 - Must Have)
As a workflow author switching between agent providers,
I want tool-use activity to surface in the same categories and format regardless of which provider (claude, codex, gemini, opencode, openai_compatible) is selected,
So that swapping provider: in a workflow does not silently change what I see during execution.
Why this priority: Cross-provider parity is the whole point of the feature. Without it, users cannot reason about provider choice as a pure backend decision. This delivers the MVP value promised by F080/F081 (behavioural parity) extended to the display surface.
Acceptance Scenarios:
- Given a single-step workflow that invokes a tool (e.g.
Read), When the workflow is run with verbose mode active against each of the 5 providers in turn, Then each run produces a tool marker for the same tool name in human-readable form, interleaved with assistant text in source order.
- Given a Claude
assistant event containing interleaved text and tool_use blocks in one line, When the parser processes that line, Then the returned []DisplayEvent preserves the source order of the blocks.
- Given an
openai_compatible response with choices[].message.tool_calls[], When the post-response translator runs, Then text and tool events are emitted through the same renderer and formatted identically to the streaming providers.
Independent Test: Unit test per provider replays a captured fixture (stream-json line or HTTP response) and asserts the produced []DisplayEvent sequence — kind, order, and truncated argument. Integration test runs the same workflow against each provider and diffs the rendered verbose output shape.
US2: Default Behaviour Unchanged from F082 (P1 - Must Have)
As a user already relying on the current F082 filtered-text display,
I want the default (non-verbose) output to remain text-only and byte-equivalent to what parseStreamLine produces today,
So that adopting F085 does not force me to re-learn my output or break existing logs and scripts.
Why this priority: A silent behavioural regression in the default path would punish every existing workflow. Default-safety is essential for MVP acceptance.
Acceptance Scenarios:
- Given any of the 5 providers and verbose mode inactive, When a workflow runs, Then the user-visible output is equivalent to current F082 behaviour (text only, no tool markers).
- Given
output_format: json is set, When a workflow runs against any provider, Then raw passthrough is unchanged and no event parsing is attempted.
- Given a captured pre-F085 run and a post-F085 run of the same workflow in default mode, When their aggregated
DisplayOutput strings are compared, Then they are equal.
Independent Test: Golden-file test per provider comparing aggregated text-event output against the pre-F085 DisplayOutput baseline. A separate test asserts output_format: json bypasses ParseEvents entirely.
US3: Tool Marker Readability (P2 - Should Have)
As a user running a workflow in verbose mode,
I want tool markers to be concise and immediately recognisable (name + short truncated argument),
So that I can track agent activity without being drowned in JSON payloads.
Why this priority: Verbose mode only becomes useful once the markers are actually readable. Gracefully degrading unknown tool names is required so new or provider-specific tools don't render as empty or malformed lines.
Acceptance Scenarios:
- Given an
EventToolUse for a well-known tool (Read, Write, Edit, Bash, Grep, Glob, Task), When the renderer formats it, Then the output shows the tool name and a pre-truncated argument (≤ 40 characters).
- Given an
EventToolUse for an unknown tool name, When the renderer formats it, Then output shows the raw name with an empty or safely truncated argument and no crash.
- Given a Codex
function_call with a JSON-encoded arguments string, When the mapper emits the event, Then the Arg field contains a short human-readable preview derived from a known field (file_path, cmd, etc.) rather than the raw JSON.
Independent Test: Renderer unit test iterates a fixed table of tool names and asserts the formatted marker. A separate test covers the unknown-tool and empty-arg fallbacks.
US4: Parser Layering Discipline (P2 - Should Have)
As a maintainer of the display pipeline,
I want parsers to return plain-string events with zero ANSI or formatting concerns,
So that rendering decisions (colour, prefix, alignment) live in one place and future UX changes do not require touching every provider.
Why this priority: Without this boundary, each provider's mapper accumulates its own cosmetic choices and the "single rendering policy" benefit collapses. Enforcing it during the initial migration avoids retrofitting later.
Acceptance Scenarios:
- Given any provider's
DisplayEventSource.ParseEvents output, When the returned events are inspected, Then no ANSI escape sequences, prefixes, or renderer-specific glyphs are present in any field.
- Given the renderer is swapped or restyled, When the same captured events are re-rendered, Then only renderer code changes — parsers and fixtures remain untouched.
Independent Test: A static check (table test or linter) scans event field values for ANSI escape bytes and known prefix strings; any match fails the test.
US5: OpenAICompatible Documented Cadence (P3 - Nice to Have)
As a user of the openai_compatible provider,
I want it to be documented that display events arrive in a single post-response burst rather than streaming,
So that I understand why "live feedback" timing differs from streaming providers, despite the rendered shape being identical.
Why this priority: Functional correctness is already covered by US1. This is a pure-documentation clarification that avoids user confusion but does not block the feature.
Acceptance Scenarios:
- Given the
openai_compatible provider, When a user reads the provider documentation, Then it clearly states that all display events are emitted at once after the HTTP response completes.
- Given a streaming provider's docs, When compared with
openai_compatible, Then the cadence difference is explicit and visible.
Independent Test: Docs review against the provider reference pages; verify an explicit cadence note exists on openai_compatible and is absent (or stated as "streaming") on the other four.
Edge Cases
- What happens when a provider emits a line that contains neither text nor tool activity (keepalive, system event)? Parser returns an empty slice; renderer emits nothing.
- How does the system handle a malformed stream-json line (invalid JSON, truncated chunk)? Parser returns an empty slice and logs at debug level; the surrounding stream continues.
- What is the behavior when Claude's single
assistant event interleaves three or more text/tool blocks? The returned slice must preserve all of them in source order.
- What happens when a tool-use
input / arguments payload has no recognisable preview field? Arg falls back to an empty string; the renderer shows name only.
- How does the system handle providers without a
DisplayEventSource implementation? baseCLIProvider.display is nil; the stream filter degrades to pass-through for that provider and no events are emitted.
- What is the behavior when
output_format: json is combined with verbose mode? Raw passthrough wins; event parsing is not invoked.
- What happens when OpenCode emits a
step-finish part after a tool call completes? The mapper does not emit a duplicate event; state transitions are observed but only one EventToolUse per tool part is produced.
Requirements
Functional Requirements
- FR-001: System MUST expose a
DisplayEventSource interface with a ParseEvents(line []byte) []DisplayEvent method, replacing the current LineExtractor function-field hook on cliProviderHooks.
- FR-002: System MUST define a
DisplayEvent type with a discriminated Kind field supporting at minimum EventText and EventToolUse, shaped to accommodate future kinds without breaking consumers.
- FR-003: System MUST implement
DisplayEventSource for each of the four CLI providers (Claude, Codex, Gemini, OpenCode) and an equivalent post-response translator for OpenAICompatible.
- FR-004:
StreamFilterWriter and the OpenAICompatible post-response path MUST invoke ParseEvents on each line/response and forward the resulting []DisplayEvent to a single interfaces-layer renderer.
- FR-005: The renderer MUST support at least two display modes: default (text only, equivalent to current F082 behaviour) and verbose (text + tool markers interleaved in source order).
- FR-006: Users MUST be able to bypass event parsing entirely via
output_format: json for raw passthrough, preserving current behaviour for every provider.
- FR-007: System MUST preserve aggregated
DisplayOutput strings on AgentResult and ConversationResult by concatenating text-event payloads in emission order.
- FR-008: Parser implementations MUST return plain strings in all
DisplayEvent fields, with no ANSI escape sequences, colour codes, or renderer-specific prefixes.
- FR-009:
EventToolUse MUST carry a Name and a pre-truncated Arg (≤ 40 characters) and MAY carry a provider-supplied tool-call id for future correlation.
- FR-010: Parsers MUST preserve source order of events emitted from a single input line (notably Claude's interleaved text/tool_use blocks within one
assistant event).
- FR-011: When a provider has no
DisplayEventSource implementation configured, the stream filter MUST degrade to pass-through for that provider without error.
- FR-012: Each provider's mapper MUST be validated against a captured live
--output-format stream-json (or HTTP response) fixture before the fixture is frozen as a regression baseline.
Non-Functional Requirements
- NFR-001: Switching
provider: between any of the five providers MUST NOT change the categories of display events a user sees in verbose mode (text and tool-use at minimum), only their provider-specific contents.
- NFR-002: Default-mode aggregated output MUST be byte-equivalent to the pre-F085 F082
DisplayOutput for every provider, verified by golden-file comparison.
- NFR-003: Parser execution MUST NOT add user-perceptible latency; per-line parse cost stays within the same order of magnitude as current
parseStreamLine.
- NFR-004: No renderer styling (ANSI, prefixes, glyphs) MAY live in parser code; rendering concerns are confined to the interfaces layer.
- NFR-005: New providers added after this feature MUST be able to participate by implementing only
DisplayEventSource, with no changes to the renderer or hook surface.
Success Criteria
- SC-001: A user switching
provider: across all 5 providers in a single-step workflow observes identical categories of verbose output (text + tool markers) with consistent marker formatting, verified by integration test.
- SC-002: Aggregated default-mode
DisplayOutput matches pre-F085 output byte-for-byte across all 5 providers, verified by 5 golden-file regression tests.
- SC-003: Unit tests for each of the 5 providers replay a captured fixture and assert the exact
[]DisplayEvent sequence produced, including kind, order, and truncated argument.
- SC-004: The renderer formats all 7 common tool names (
Read, Write, Edit, Bash, Grep, Glob, Task) with a recognisable marker and degrades gracefully on unknown names, verified by a renderer unit test.
- SC-005: Adding a new event kind to the enum requires zero changes to existing parser contracts (interface signature unchanged) and zero changes to the renderer's default-mode output.
Key Entities
| Entity |
Description |
Key Attributes |
| DisplayEvent |
Discriminated intermediate representation of anything a provider can surface to the user |
Kind (EventText | EventToolUse | ...), Text (for text events), Name + Arg + optional tool-call id (for tool-use events) |
| DisplayEventSource |
Interface contract implemented by each provider to translate wire-format lines into an ordered event stream |
ParseEvents(line []byte) []DisplayEvent |
| Renderer |
Single interfaces-layer component owning all display styling decisions (colour, prefix, alignment) and mode selection |
display mode (default | verbose), formatting policy per EventKind, tool-name marker table |
| baseCLIProvider (extended) |
Composes an optional DisplayEventSource alongside existing hooks; delegates parsing to it when non-nil |
display DisplayEventSource field (nil-safe) |
Assumptions
- Persisted session schemas (Claude
message.content[], Codex response_item, Gemini messages[].toolCalls[], OpenCode part/*.json) are close to but not identical to live stream-json output; live capture is required per provider before freezing fixtures.
- OpenAICompatible tool-call translation via
choices[].message.tool_calls[] is a one-shot post-response operation; users accept the cadence difference from streaming providers.
- A 40-character truncation limit on
EventToolUse.Arg is sufficient for readable markers without burying output in payload dumps; this can be tuned later without breaking the contract.
- The set of "well-known" tool names for renderer formatting is small and stable enough that a hand-maintained table is acceptable; unknown tools degrade gracefully rather than failing.
- The
DisplayEvent type can live in a dedicated infrastructure-adjacent package without being pulled into internal/domain/; final placement is decided during implementation.
- The
UsageSource refactor (observation 01) can ship before, after, or independently of this feature without cross-blocking.
Metadata
- Status: backlog
- Version: v0.8.0
- Priority: high
- Estimation: L
Dependencies
- Blocked by: none
- Unblocks: future
EventReasoning / thinking-block surfacing feature; future user-facing verbose-mode toggle; parallel UsageSource refactor (coordinated, not blocking)
Clarifications
Section populated during clarify step with resolved ambiguities.
Notes
- Builds directly on F082 (
StreamFilterWriter, LineExtractor, DisplayOutput) and F080 (baseCLIProvider, cliProviderHooks) — the hook is internal so the breaking infrastructure-layer migration is acceptable but every provider must be migrated atomically.
- Package placement decision (
internal/infrastructure/agents/display vs a neutral package) must be documented inline during implementation; pulling streaming concerns into internal/domain/ is rejected.
- Live-format validation is a prerequisite for each provider's fixture: persisted storage serves as a schema reference, not as the final regression baseline.
- Parallel refactor for
UsageSource (observation 01) shares the "move cliProviderHooks toward pure composition" direction; coordinating the two migrations is recommended but not blocking.
- Claude
thinking, OpenCode reasoning, and Gemini thoughts are the same conceptual signal under three names; the enum must accommodate a future EventReasoning without reshaping, even though emission is out of scope.
- Renderer owns all ANSI/colour/prefix decisions — parsers emit plain strings only. Enforce via static check.
F085: Unified Display-Event Abstraction Across All Agent Providers
Scope
In Scope
DisplayEventSourceinterface contract replacing theLineExtractorfunction-field hook oncliProviderHooks.DisplayEventtype with at leastEventTextandEventToolUsekinds.DisplayEventSourcefor all 5 providers: Claude, Codex, Gemini, OpenCode (streaming), and OpenAICompatible (post-response translation).DisplayOutputstring aggregation onAgentResult/ConversationResultby aggregating text events.DisplayEventtype in a dedicated infrastructure-adjacent package (location decided during implementation and documented).Out of Scope
EventReasoningand other extended event kinds (thinking blocks, rate-limit notices, cache-hit markers, agent hand-offs) — enum must accommodate but emission is deferred.UsageSourceinterface (observation 01) — tracked as a parallel refactor that can ship independently.Deferred
EventReasoningemission for Claudethinking/ OpenCodereasoning/ GeminithoughtsUsageSourceinterface contract for per-turn token accountingUser Stories
US1: Consistent Tool Activity Across Providers (P1 - Must Have)
As a workflow author switching between agent providers,
I want tool-use activity to surface in the same categories and format regardless of which provider (
claude,codex,gemini,opencode,openai_compatible) is selected,So that swapping
provider:in a workflow does not silently change what I see during execution.Why this priority: Cross-provider parity is the whole point of the feature. Without it, users cannot reason about provider choice as a pure backend decision. This delivers the MVP value promised by F080/F081 (behavioural parity) extended to the display surface.
Acceptance Scenarios:
Read), When the workflow is run with verbose mode active against each of the 5 providers in turn, Then each run produces a tool marker for the same tool name in human-readable form, interleaved with assistant text in source order.assistantevent containing interleavedtextandtool_useblocks in one line, When the parser processes that line, Then the returned[]DisplayEventpreserves the source order of the blocks.openai_compatibleresponse withchoices[].message.tool_calls[], When the post-response translator runs, Then text and tool events are emitted through the same renderer and formatted identically to the streaming providers.Independent Test: Unit test per provider replays a captured fixture (stream-json line or HTTP response) and asserts the produced
[]DisplayEventsequence — kind, order, and truncated argument. Integration test runs the same workflow against each provider and diffs the rendered verbose output shape.US2: Default Behaviour Unchanged from F082 (P1 - Must Have)
As a user already relying on the current F082 filtered-text display,
I want the default (non-verbose) output to remain text-only and byte-equivalent to what
parseStreamLineproduces today,So that adopting F085 does not force me to re-learn my output or break existing logs and scripts.
Why this priority: A silent behavioural regression in the default path would punish every existing workflow. Default-safety is essential for MVP acceptance.
Acceptance Scenarios:
output_format: jsonis set, When a workflow runs against any provider, Then raw passthrough is unchanged and no event parsing is attempted.DisplayOutputstrings are compared, Then they are equal.Independent Test: Golden-file test per provider comparing aggregated text-event output against the pre-F085
DisplayOutputbaseline. A separate test assertsoutput_format: jsonbypassesParseEventsentirely.US3: Tool Marker Readability (P2 - Should Have)
As a user running a workflow in verbose mode,
I want tool markers to be concise and immediately recognisable (name + short truncated argument),
So that I can track agent activity without being drowned in JSON payloads.
Why this priority: Verbose mode only becomes useful once the markers are actually readable. Gracefully degrading unknown tool names is required so new or provider-specific tools don't render as empty or malformed lines.
Acceptance Scenarios:
EventToolUsefor a well-known tool (Read,Write,Edit,Bash,Grep,Glob,Task), When the renderer formats it, Then the output shows the tool name and a pre-truncated argument (≤ 40 characters).EventToolUsefor an unknown tool name, When the renderer formats it, Then output shows the raw name with an empty or safely truncated argument and no crash.function_callwith a JSON-encodedargumentsstring, When the mapper emits the event, Then theArgfield contains a short human-readable preview derived from a known field (file_path,cmd, etc.) rather than the raw JSON.Independent Test: Renderer unit test iterates a fixed table of tool names and asserts the formatted marker. A separate test covers the unknown-tool and empty-arg fallbacks.
US4: Parser Layering Discipline (P2 - Should Have)
As a maintainer of the display pipeline,
I want parsers to return plain-string events with zero ANSI or formatting concerns,
So that rendering decisions (colour, prefix, alignment) live in one place and future UX changes do not require touching every provider.
Why this priority: Without this boundary, each provider's mapper accumulates its own cosmetic choices and the "single rendering policy" benefit collapses. Enforcing it during the initial migration avoids retrofitting later.
Acceptance Scenarios:
DisplayEventSource.ParseEventsoutput, When the returned events are inspected, Then no ANSI escape sequences, prefixes, or renderer-specific glyphs are present in any field.Independent Test: A static check (table test or linter) scans event field values for ANSI escape bytes and known prefix strings; any match fails the test.
US5: OpenAICompatible Documented Cadence (P3 - Nice to Have)
As a user of the
openai_compatibleprovider,I want it to be documented that display events arrive in a single post-response burst rather than streaming,
So that I understand why "live feedback" timing differs from streaming providers, despite the rendered shape being identical.
Why this priority: Functional correctness is already covered by US1. This is a pure-documentation clarification that avoids user confusion but does not block the feature.
Acceptance Scenarios:
openai_compatibleprovider, When a user reads the provider documentation, Then it clearly states that all display events are emitted at once after the HTTP response completes.openai_compatible, Then the cadence difference is explicit and visible.Independent Test: Docs review against the provider reference pages; verify an explicit cadence note exists on
openai_compatibleand is absent (or stated as "streaming") on the other four.Edge Cases
assistantevent interleaves three or more text/tool blocks? The returned slice must preserve all of them in source order.input/argumentspayload has no recognisable preview field?Argfalls back to an empty string; the renderer shows name only.DisplayEventSourceimplementation?baseCLIProvider.displayis nil; the stream filter degrades to pass-through for that provider and no events are emitted.output_format: jsonis combined with verbose mode? Raw passthrough wins; event parsing is not invoked.step-finishpart after a tool call completes? The mapper does not emit a duplicate event; state transitions are observed but only oneEventToolUseper tool part is produced.Requirements
Functional Requirements
DisplayEventSourceinterface with aParseEvents(line []byte) []DisplayEventmethod, replacing the currentLineExtractorfunction-field hook oncliProviderHooks.DisplayEventtype with a discriminatedKindfield supporting at minimumEventTextandEventToolUse, shaped to accommodate future kinds without breaking consumers.DisplayEventSourcefor each of the four CLI providers (Claude, Codex, Gemini, OpenCode) and an equivalent post-response translator for OpenAICompatible.StreamFilterWriterand the OpenAICompatible post-response path MUST invokeParseEventson each line/response and forward the resulting[]DisplayEventto a single interfaces-layer renderer.output_format: jsonfor raw passthrough, preserving current behaviour for every provider.DisplayOutputstrings onAgentResultandConversationResultby concatenating text-event payloads in emission order.DisplayEventfields, with no ANSI escape sequences, colour codes, or renderer-specific prefixes.EventToolUseMUST carry aNameand a pre-truncatedArg(≤ 40 characters) and MAY carry a provider-supplied tool-call id for future correlation.assistantevent).DisplayEventSourceimplementation configured, the stream filter MUST degrade to pass-through for that provider without error.--output-format stream-json(or HTTP response) fixture before the fixture is frozen as a regression baseline.Non-Functional Requirements
provider:between any of the five providers MUST NOT change the categories of display events a user sees in verbose mode (text and tool-use at minimum), only their provider-specific contents.DisplayOutputfor every provider, verified by golden-file comparison.parseStreamLine.DisplayEventSource, with no changes to the renderer or hook surface.Success Criteria
provider:across all 5 providers in a single-step workflow observes identical categories of verbose output (text + tool markers) with consistent marker formatting, verified by integration test.DisplayOutputmatches pre-F085 output byte-for-byte across all 5 providers, verified by 5 golden-file regression tests.[]DisplayEventsequence produced, including kind, order, and truncated argument.Read,Write,Edit,Bash,Grep,Glob,Task) with a recognisable marker and degrades gracefully on unknown names, verified by a renderer unit test.Key Entities
Kind(EventText | EventToolUse | ...),Text(for text events),Name+Arg+ optional tool-call id (for tool-use events)ParseEvents(line []byte) []DisplayEventEventKind, tool-name marker tableDisplayEventSourcealongside existing hooks; delegates parsing to it when non-nildisplay DisplayEventSourcefield (nil-safe)Assumptions
message.content[], Codexresponse_item, Geminimessages[].toolCalls[], OpenCodepart/*.json) are close to but not identical to live stream-json output; live capture is required per provider before freezing fixtures.choices[].message.tool_calls[]is a one-shot post-response operation; users accept the cadence difference from streaming providers.EventToolUse.Argis sufficient for readable markers without burying output in payload dumps; this can be tuned later without breaking the contract.DisplayEventtype can live in a dedicated infrastructure-adjacent package without being pulled intointernal/domain/; final placement is decided during implementation.UsageSourcerefactor (observation 01) can ship before, after, or independently of this feature without cross-blocking.Metadata
Dependencies
EventReasoning/ thinking-block surfacing feature; future user-facing verbose-mode toggle; parallelUsageSourcerefactor (coordinated, not blocking)Clarifications
Section populated during clarify step with resolved ambiguities.
Notes
StreamFilterWriter,LineExtractor,DisplayOutput) and F080 (baseCLIProvider,cliProviderHooks) — the hook is internal so the breaking infrastructure-layer migration is acceptable but every provider must be migrated atomically.internal/infrastructure/agents/displayvs a neutral package) must be documented inline during implementation; pulling streaming concerns intointernal/domain/is rejected.UsageSource(observation 01) shares the "movecliProviderHookstoward pure composition" direction; coordinating the two migrations is recommended but not blocking.thinking, OpenCodereasoning, and Geminithoughtsare the same conceptual signal under three names; the enum must accommodate a futureEventReasoningwithout reshaping, even though emission is out of scope.