Skip to content

Streaming LLM Response Support #4

@bdsaglam

Description

@bdsaglam

Problem

Current implementation waits for complete LLM responses. Streaming support would improve user experience for long-running generations.

Challenges

Temporal activities return a single result. Streaming requires a different pattern:

  • Signals: Push tokens to workflow as they arrive
  • Queries: Poll for partial results
  • Heartbeats: Include partial content in heartbeat data

Proposed Approach

Investigate Temporal patterns for streaming:

  1. Activity with Signals

    • Activity streams tokens and sends signals to workflow
    • Workflow accumulates tokens and can expose via query
  2. Event-based Pattern

    • Similar to Pydantic AI's _call_event_stream_handler_activity
    • Buffer events and periodically flush to workflow

Research Needed

  • How does Pydantic AI handle streaming in their Temporal integration?
  • What's the overhead of signals for high-frequency token streaming?
  • Can we maintain durability while streaming?

Priority

Medium

References

  • Pydantic AI streaming: _agent.py event stream handler
  • Temporal signals documentation

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions