F085: Unified Display-Event Abstraction Across All Agent Providers

# F085: Unified Display-Event Abstraction Across All Agent Providers

## Scope

### In Scope

- Introduction of a `DisplayEventSource` interface contract replacing the `LineExtractor` function-field hook on `cliProviderHooks`.
- Definition of a discriminated `DisplayEvent` type with at least `EventText` and `EventToolUse` kinds.
- Implementation of `DisplayEventSource` for all 5 providers: Claude, Codex, Gemini, OpenCode (streaming), and OpenAICompatible (post-response translation).
- Single interfaces-layer renderer with default (text only) and verbose (text + tool markers) display modes.
- Preservation of `DisplayOutput` string aggregation on `AgentResult`/`ConversationResult` by aggregating text events.
- Placement of `DisplayEvent` type in a dedicated infrastructure-adjacent package (location decided during implementation and documented).

### Out of Scope

- `EventReasoning` and other extended event kinds (thinking blocks, rate-limit notices, cache-hit markers, agent hand-offs) — enum must accommodate but emission is deferred.
- Real-usage parsing via `UsageSource` interface (observation 01) — tracked as a parallel refactor that can ship independently.
- Renderer theming, colour customisation, or user-configurable marker formats.
- Persistence or replay of display events across sessions.

### Deferred

| Item | Rationale | Follow-up |
|------|-----------|-----------|
| `EventReasoning` emission for Claude `thinking` / OpenCode `reasoning` / Gemini `thoughts` | Requires opt-in UX design; out of scope to keep this feature focused on tool-use parity | future |
| Tool-call id correlation (request → response) across turns | Needs a correlation model that spans conversation state; not required for the rendering surface this feature ships | future |
| `UsageSource` interface contract for per-turn token accounting | Parallel refactor with independent test surface and lifecycle; ships separately | future (observation 01) |
| Per-user verbose-mode configuration (CLI flag or workflow field) | Renderer supports modes internally; user-facing toggle is a follow-up once default shape is proven | future |

---

## User Stories

### US1: Consistent Tool Activity Across Providers (P1 - Must Have)

**As a** workflow author switching between agent providers,
**I want** tool-use activity to surface in the same categories and format regardless of which provider (`claude`, `codex`, `gemini`, `opencode`, `openai_compatible`) is selected,
**So that** swapping `provider:` in a workflow does not silently change what I see during execution.

**Why this priority**: Cross-provider parity is the whole point of the feature. Without it, users cannot reason about provider choice as a pure backend decision. This delivers the MVP value promised by F080/F081 (behavioural parity) extended to the display surface.

**Acceptance Scenarios:**
1. **Given** a single-step workflow that invokes a tool (e.g. `Read`), **When** the workflow is run with verbose mode active against each of the 5 providers in turn, **Then** each run produces a tool marker for the same tool name in human-readable form, interleaved with assistant text in source order.
2. **Given** a Claude `assistant` event containing interleaved `text` and `tool_use` blocks in one line, **When** the parser processes that line, **Then** the returned `[]DisplayEvent` preserves the source order of the blocks.
3. **Given** an `openai_compatible` response with `choices[].message.tool_calls[]`, **When** the post-response translator runs, **Then** text and tool events are emitted through the same renderer and formatted identically to the streaming providers.

**Independent Test:** Unit test per provider replays a captured fixture (stream-json line or HTTP response) and asserts the produced `[]DisplayEvent` sequence — kind, order, and truncated argument. Integration test runs the same workflow against each provider and diffs the rendered verbose output shape.

### US2: Default Behaviour Unchanged from F082 (P1 - Must Have)

**As a** user already relying on the current F082 filtered-text display,
**I want** the default (non-verbose) output to remain text-only and byte-equivalent to what `parseStreamLine` produces today,
**So that** adopting F085 does not force me to re-learn my output or break existing logs and scripts.

**Why this priority**: A silent behavioural regression in the default path would punish every existing workflow. Default-safety is essential for MVP acceptance.

**Acceptance Scenarios:**
1. **Given** any of the 5 providers and verbose mode inactive, **When** a workflow runs, **Then** the user-visible output is equivalent to current F082 behaviour (text only, no tool markers).
2. **Given** `output_format: json` is set, **When** a workflow runs against any provider, **Then** raw passthrough is unchanged and no event parsing is attempted.
3. **Given** a captured pre-F085 run and a post-F085 run of the same workflow in default mode, **When** their aggregated `DisplayOutput` strings are compared, **Then** they are equal.

**Independent Test:** Golden-file test per provider comparing aggregated text-event output against the pre-F085 `DisplayOutput` baseline. A separate test asserts `output_format: json` bypasses `ParseEvents` entirely.

### US3: Tool Marker Readability (P2 - Should Have)

**As a** user running a workflow in verbose mode,
**I want** tool markers to be concise and immediately recognisable (name + short truncated argument),
**So that** I can track agent activity without being drowned in JSON payloads.

**Why this priority**: Verbose mode only becomes useful once the markers are actually readable. Gracefully degrading unknown tool names is required so new or provider-specific tools don't render as empty or malformed lines.

**Acceptance Scenarios:**
1. **Given** an `EventToolUse` for a well-known tool (`Read`, `Write`, `Edit`, `Bash`, `Grep`, `Glob`, `Task`), **When** the renderer formats it, **Then** the output shows the tool name and a pre-truncated argument (≤ 40 characters).
2. **Given** an `EventToolUse` for an unknown tool name, **When** the renderer formats it, **Then** output shows the raw name with an empty or safely truncated argument and no crash.
3. **Given** a Codex `function_call` with a JSON-encoded `arguments` string, **When** the mapper emits the event, **Then** the `Arg` field contains a short human-readable preview derived from a known field (`file_path`, `cmd`, etc.) rather than the raw JSON.

**Independent Test:** Renderer unit test iterates a fixed table of tool names and asserts the formatted marker. A separate test covers the unknown-tool and empty-arg fallbacks.

### US4: Parser Layering Discipline (P2 - Should Have)

**As a** maintainer of the display pipeline,
**I want** parsers to return plain-string events with zero ANSI or formatting concerns,
**So that** rendering decisions (colour, prefix, alignment) live in one place and future UX changes do not require touching every provider.

**Why this priority**: Without this boundary, each provider's mapper accumulates its own cosmetic choices and the "single rendering policy" benefit collapses. Enforcing it during the initial migration avoids retrofitting later.

**Acceptance Scenarios:**
1. **Given** any provider's `DisplayEventSource.ParseEvents` output, **When** the returned events are inspected, **Then** no ANSI escape sequences, prefixes, or renderer-specific glyphs are present in any field.
2. **Given** the renderer is swapped or restyled, **When** the same captured events are re-rendered, **Then** only renderer code changes — parsers and fixtures remain untouched.

**Independent Test:** A static check (table test or linter) scans event field values for ANSI escape bytes and known prefix strings; any match fails the test.

### US5: OpenAICompatible Documented Cadence (P3 - Nice to Have)

**As a** user of the `openai_compatible` provider,
**I want** it to be documented that display events arrive in a single post-response burst rather than streaming,
**So that** I understand why "live feedback" timing differs from streaming providers, despite the rendered shape being identical.

**Why this priority**: Functional correctness is already covered by US1. This is a pure-documentation clarification that avoids user confusion but does not block the feature.

**Acceptance Scenarios:**
1. **Given** the `openai_compatible` provider, **When** a user reads the provider documentation, **Then** it clearly states that all display events are emitted at once after the HTTP response completes.
2. **Given** a streaming provider's docs, **When** compared with `openai_compatible`, **Then** the cadence difference is explicit and visible.

**Independent Test:** Docs review against the provider reference pages; verify an explicit cadence note exists on `openai_compatible` and is absent (or stated as "streaming") on the other four.

### Edge Cases

- What happens when a provider emits a line that contains neither text nor tool activity (keepalive, system event)? Parser returns an empty slice; renderer emits nothing.
- How does the system handle a malformed stream-json line (invalid JSON, truncated chunk)? Parser returns an empty slice and logs at debug level; the surrounding stream continues.
- What is the behavior when Claude's single `assistant` event interleaves three or more text/tool blocks? The returned slice must preserve all of them in source order.
- What happens when a tool-use `input` / `arguments` payload has no recognisable preview field? `Arg` falls back to an empty string; the renderer shows name only.
- How does the system handle providers without a `DisplayEventSource` implementation? `baseCLIProvider.display` is nil; the stream filter degrades to pass-through for that provider and no events are emitted.
- What is the behavior when `output_format: json` is combined with verbose mode? Raw passthrough wins; event parsing is not invoked.
- What happens when OpenCode emits a `step-finish` part after a tool call completes? The mapper does not emit a duplicate event; state transitions are observed but only one `EventToolUse` per tool part is produced.

---

## Requirements

### Functional Requirements

- **FR-001**: System MUST expose a `DisplayEventSource` interface with a `ParseEvents(line []byte) []DisplayEvent` method, replacing the current `LineExtractor` function-field hook on `cliProviderHooks`.
- **FR-002**: System MUST define a `DisplayEvent` type with a discriminated `Kind` field supporting at minimum `EventText` and `EventToolUse`, shaped to accommodate future kinds without breaking consumers.
- **FR-003**: System MUST implement `DisplayEventSource` for each of the four CLI providers (Claude, Codex, Gemini, OpenCode) and an equivalent post-response translator for OpenAICompatible.
- **FR-004**: `StreamFilterWriter` and the OpenAICompatible post-response path MUST invoke `ParseEvents` on each line/response and forward the resulting `[]DisplayEvent` to a single interfaces-layer renderer.
- **FR-005**: The renderer MUST support at least two display modes: default (text only, equivalent to current F082 behaviour) and verbose (text + tool markers interleaved in source order).
- **FR-006**: Users MUST be able to bypass event parsing entirely via `output_format: json` for raw passthrough, preserving current behaviour for every provider.
- **FR-007**: System MUST preserve aggregated `DisplayOutput` strings on `AgentResult` and `ConversationResult` by concatenating text-event payloads in emission order.
- **FR-008**: Parser implementations MUST return plain strings in all `DisplayEvent` fields, with no ANSI escape sequences, colour codes, or renderer-specific prefixes.
- **FR-009**: `EventToolUse` MUST carry a `Name` and a pre-truncated `Arg` (≤ 40 characters) and MAY carry a provider-supplied tool-call id for future correlation.
- **FR-010**: Parsers MUST preserve source order of events emitted from a single input line (notably Claude's interleaved text/tool_use blocks within one `assistant` event).
- **FR-011**: When a provider has no `DisplayEventSource` implementation configured, the stream filter MUST degrade to pass-through for that provider without error.
- **FR-012**: Each provider's mapper MUST be validated against a captured live `--output-format stream-json` (or HTTP response) fixture before the fixture is frozen as a regression baseline.

### Non-Functional Requirements

- **NFR-001**: Switching `provider:` between any of the five providers MUST NOT change the categories of display events a user sees in verbose mode (text and tool-use at minimum), only their provider-specific contents.
- **NFR-002**: Default-mode aggregated output MUST be byte-equivalent to the pre-F085 F082 `DisplayOutput` for every provider, verified by golden-file comparison.
- **NFR-003**: Parser execution MUST NOT add user-perceptible latency; per-line parse cost stays within the same order of magnitude as current `parseStreamLine`.
- **NFR-004**: No renderer styling (ANSI, prefixes, glyphs) MAY live in parser code; rendering concerns are confined to the interfaces layer.
- **NFR-005**: New providers added after this feature MUST be able to participate by implementing only `DisplayEventSource`, with no changes to the renderer or hook surface.

---

## Success Criteria

- **SC-001**: A user switching `provider:` across all 5 providers in a single-step workflow observes identical categories of verbose output (text + tool markers) with consistent marker formatting, verified by integration test.
- **SC-002**: Aggregated default-mode `DisplayOutput` matches pre-F085 output byte-for-byte across all 5 providers, verified by 5 golden-file regression tests.
- **SC-003**: Unit tests for each of the 5 providers replay a captured fixture and assert the exact `[]DisplayEvent` sequence produced, including kind, order, and truncated argument.
- **SC-004**: The renderer formats all 7 common tool names (`Read`, `Write`, `Edit`, `Bash`, `Grep`, `Glob`, `Task`) with a recognisable marker and degrades gracefully on unknown names, verified by a renderer unit test.
- **SC-005**: Adding a new event kind to the enum requires zero changes to existing parser contracts (interface signature unchanged) and zero changes to the renderer's default-mode output.

---

## Key Entities

| Entity | Description | Key Attributes |
|--------|-------------|----------------|
| DisplayEvent | Discriminated intermediate representation of anything a provider can surface to the user | `Kind` (EventText \| EventToolUse \| ...), `Text` (for text events), `Name` + `Arg` + optional tool-call id (for tool-use events) |
| DisplayEventSource | Interface contract implemented by each provider to translate wire-format lines into an ordered event stream | `ParseEvents(line []byte) []DisplayEvent` |
| Renderer | Single interfaces-layer component owning all display styling decisions (colour, prefix, alignment) and mode selection | display mode (default \| verbose), formatting policy per `EventKind`, tool-name marker table |
| baseCLIProvider (extended) | Composes an optional `DisplayEventSource` alongside existing hooks; delegates parsing to it when non-nil | `display DisplayEventSource` field (nil-safe) |

---

## Assumptions

- Persisted session schemas (Claude `message.content[]`, Codex `response_item`, Gemini `messages[].toolCalls[]`, OpenCode `part/*.json`) are close to but not identical to live stream-json output; live capture is required per provider before freezing fixtures.
- OpenAICompatible tool-call translation via `choices[].message.tool_calls[]` is a one-shot post-response operation; users accept the cadence difference from streaming providers.
- A 40-character truncation limit on `EventToolUse.Arg` is sufficient for readable markers without burying output in payload dumps; this can be tuned later without breaking the contract.
- The set of "well-known" tool names for renderer formatting is small and stable enough that a hand-maintained table is acceptable; unknown tools degrade gracefully rather than failing.
- The `DisplayEvent` type can live in a dedicated infrastructure-adjacent package without being pulled into `internal/domain/`; final placement is decided during implementation.
- The `UsageSource` refactor (observation 01) can ship before, after, or independently of this feature without cross-blocking.

---

## Metadata

- **Status**: backlog
- **Version**: v0.8.0
- **Priority**: high
- **Estimation**: L

## Dependencies

- **Blocked by**: none
- **Unblocks**: future `EventReasoning` / thinking-block surfacing feature; future user-facing verbose-mode toggle; parallel `UsageSource` refactor (coordinated, not blocking)

## Clarifications

_Section populated during clarify step with resolved ambiguities._

## Notes

- Builds directly on F082 (`StreamFilterWriter`, `LineExtractor`, `DisplayOutput`) and F080 (`baseCLIProvider`, `cliProviderHooks`) — the hook is internal so the breaking infrastructure-layer migration is acceptable but every provider must be migrated atomically.
- Package placement decision (`internal/infrastructure/agents/display` vs a neutral package) must be documented inline during implementation; pulling streaming concerns into `internal/domain/` is rejected.
- Live-format validation is a prerequisite for each provider's fixture: persisted storage serves as a schema reference, not as the final regression baseline.
- Parallel refactor for `UsageSource` (observation 01) shares the "move `cliProviderHooks` toward pure composition" direction; coordinating the two migrations is recommended but not blocking.
- Claude `thinking`, OpenCode `reasoning`, and Gemini `thoughts` are the same conceptual signal under three names; the enum must accommodate a future `EventReasoning` without reshaping, even though emission is out of scope.
- Renderer owns all ANSI/colour/prefix decisions — parsers emit plain strings only. Enforce via static check.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

F085: Unified Display-Event Abstraction Across All Agent Providers #317

F085: Unified Display-Event Abstraction Across All Agent Providers

Scope

In Scope

Out of Scope

Deferred

User Stories

US1: Consistent Tool Activity Across Providers (P1 - Must Have)

US2: Default Behaviour Unchanged from F082 (P1 - Must Have)

US3: Tool Marker Readability (P2 - Should Have)

US4: Parser Layering Discipline (P2 - Should Have)

US5: OpenAICompatible Documented Cadence (P3 - Nice to Have)

Edge Cases

Requirements

Functional Requirements

Non-Functional Requirements

Success Criteria

Key Entities

Assumptions

Metadata

Dependencies

Clarifications

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Item	Rationale	Follow-up
`EventReasoning` emission for Claude `thinking` / OpenCode `reasoning` / Gemini `thoughts`	Requires opt-in UX design; out of scope to keep this feature focused on tool-use parity	future
Tool-call id correlation (request → response) across turns	Needs a correlation model that spans conversation state; not required for the rendering surface this feature ships	future
`UsageSource` interface contract for per-turn token accounting	Parallel refactor with independent test surface and lifecycle; ships separately	future (observation 01)
Per-user verbose-mode configuration (CLI flag or workflow field)	Renderer supports modes internally; user-facing toggle is a follow-up once default shape is proven	future

Entity	Description	Key Attributes
DisplayEvent	Discriminated intermediate representation of anything a provider can surface to the user	`Kind` (EventText \| EventToolUse \| ...), `Text` (for text events), `Name` + `Arg` + optional tool-call id (for tool-use events)
DisplayEventSource	Interface contract implemented by each provider to translate wire-format lines into an ordered event stream	`ParseEvents(line []byte) []DisplayEvent`
Renderer	Single interfaces-layer component owning all display styling decisions (colour, prefix, alignment) and mode selection	display mode (default \| verbose), formatting policy per `EventKind`, tool-name marker table
baseCLIProvider (extended)	Composes an optional `DisplayEventSource` alongside existing hooks; delegates parsing to it when non-nil	`display DisplayEventSource` field (nil-safe)

Uh oh!

F085: Unified Display-Event Abstraction Across All Agent Providers #317

Description

F085: Unified Display-Event Abstraction Across All Agent Providers

Scope

In Scope

Out of Scope

Deferred

User Stories

US1: Consistent Tool Activity Across Providers (P1 - Must Have)

US2: Default Behaviour Unchanged from F082 (P1 - Must Have)

US3: Tool Marker Readability (P2 - Should Have)

US4: Parser Layering Discipline (P2 - Should Have)

US5: OpenAICompatible Documented Cadence (P3 - Nice to Have)

Edge Cases

Requirements

Functional Requirements

Non-Functional Requirements

Success Criteria

Key Entities

Assumptions

Metadata

Dependencies

Clarifications

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions