Skip to content

Conversation

@JoaoDiasAbly
Copy link
Contributor

Description

Context: https://ably.atlassian.net/browse/AIT-238

  • when publishing or appending without waiting for an ack from the previous operation (pipelining) we can have one of the intermediary publishes/appends fail, creating a gap in the stream
  • when such thing happens, we should "patch" the stream by:
    • message-per-response: publish the complete message with a different event type which should replace all tokens published before for that response
      • some services already do this, so it's 1:1 mapping
      • others don't, and the agent would need to buffers the response tokens on its side and patch the stream with the concatenation in case the publishing fails
    • message-per-token: patch the message with an edit operation containing the complete response (same concerns as above regarding how to implement and which services already do something similar)

Overall the questions I think need to be answers in these docs sections are:

  • What happens if a publish/append fails?
  • How to detect failures when you're not awaiting?
  • How to handle/retry failed operations?
  • What error scenarios to expect (rate limits, size limits, connection issues)?

GregHolmes and others added 30 commits January 15, 2026 10:17
Link to the pending `/ai-transport` overview page.
Add intro describing the pattern, its properties, and use cases.
Includes continuous token streams, correlating tokens for distinct
responses, and explicit start/end events.
Splits each token streaming approach into distinct patterns and shows
both the publish and subscribe side behaviour alongside one another.
Includes hydration with rewind and hydration with persisted history +
untilAttach. Describes the pattern for handling in-progress live
responses with complete responses loaded from the database.
Add doc explaining streaming tokens with appendMessage and update
compaction allowing message-per-response history.
Unifies the token streaming nav for token streaming after rebase.
Refines the intro copy in message-per-response to have structural
similarity with the message-per-token page.
Refine the Publishing section of the message-per-response docs.

- Include anchor tags on title
- Describe the `serial` identifier
- Align with stream pattern used in message-per-token docs
- Remove duplicate example
Refine the Subscribing section of the message-per-response docs.

- Add anchor tag to heading
- Describes each action upfront
- Uses RANDOM_CHANNEL_NAME
Refine the rewind section of the message-per-response docs.

- Include description of allowed rewind paameters
- Tweak copy
Refines the history section for the message-per-response docs.

- Adds anchor to heading
- Uses RANDOM_CHANNEL_NAME
- Use message serial in code snippet instead of ID
- Tweaks copy
Fix the hydration of in progress responses via rewind by using the responseId in the extras to correlate messages with completed responses loaded from the database.
Fix the hydration of in progress responses using history by obtaining
the timestamp of the last completed response loaded from the database
and paginating history forwards from that point.
Removes the headers/metadata section, as this covers the specific
semantics of extras.headers handling with appends, which is better
addressed by the (upcoming) message append pub/sub docs. Instead, a
callout is used to describe header mixin semantics in the appropriate
place insofar as it relates to the discussion at hand.
Update the token streaming with message per token docs to include a
callout describing resume behaviour in case of transient disconnection.
Fix the message per token docs headers to include anchors and align with
naming in the message per response page.
Adds an overview page for a Sessions & Identity section which describes the channel-oriented session model and its benefits over the traditional connection-oriented model.

Describes how identity relates to session management and how this works in the context of channel-oriented sessions.

Shows how to use identified clients to assign a trusted identity to users and obtain this identity from the agent side.

Shows how to use Ably capabilities to control which operations
authenticated users can perform on which channels.

Shows how to use authenticated user claims to associated a role or other attribute with a user.

Updates the docs to describe how to handle authentication, capabilities, identity and roles/attributes for agents separately from end users.

Describes how to use presence to mark users and agents as online/offline. Includes description of synthetic leaves in the event of abrupt disconnection.

Describe how to subscribe to presence to see who is online, and take action when a user is offline across all devices.

Add docs for resuming user and agent sessions, linking to hydration patterns for different token streaming approaches for user resumes and describing agent resume behaviour with message catch up.
Adds a guide for using the OpenAI SDK to consume streaming events from
the Responses API and publish them over Ably using the message per token
pattern.
- Uses a further-reading callout instead of note
- Removes repeated code initialising Ably client (OpenAI client already
  instantiated)
Adds an anchor tag to the "Client hydration" heading
Similar to the open ai message per token guide, but using the message
per response pattern with appends.
Documents patterns for exposing reasoning output from models along with
final output.
Overview page for token streaming in AI Transport

---------

Co-authored-by: matt423 <matthew.a423@gmail.com>
Co-authored-by: Fiona Corden <fiona.corden@ably.com>
Co-authored-by: Paddy Byers <paddy.byers@gmail.com>
GregHolmes and others added 17 commits January 15, 2026 10:22
Document how to implement human oversight of AI agent actions using
Ably channels and capabilities for authorization workflows.
Document how users send prompts to AI agents over Ably channels,
including identified clients, message correlation, and handling
concurrent prompts.
Co-authored-by: Paddy Byers <paddy.byers@gmail.com>
Details the message-per-response pattern using Ably `appendMessage` for Anthropic SDK.
Adds a page to the Messaging section that describes sending tool calls
and results to users over channels. Indicates ability to build
generative user interfaces or implement human in the loop workflows.
@coderabbitai
Copy link

coderabbitai bot commented Jan 16, 2026

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Member

@paddybyers paddybyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested clarification


### Handling append failures <a id="append-failures"/>

When appending without awaiting, it is possible for an intermediate append to fail while subsequent appends succeed. This creates a gap in the streamed response. For example, if a rate limit is exceeded, a single append may be rejected while the following tokens continue to be accepted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When appending without awaiting, it is possible for an intermediate append to fail while subsequent appends succeed. This creates a gap in the streamed response. For example, if a rate limit is exceeded, a single append may be rejected while the following tokens continue to be accepted.
The examples above append successive tokens to a response message by pipelining the append operations - that is, the agent will publish an append operation without waiting for prior operations to complete. This is necessary in order to avoid the append rate being capped by the round-trip time from the agent to the Ably endpoint. However, this means that the agent does not await the outcome of each append operation, and that can result in the agent continuing to submit append operations after an earlier operation has failed. For example, if a rate limit is exceeded, a single append may be rejected while the following tokens continue to be accepted.
The agent needs to obtain the outcome of each append operation, and take corrective action in the event that any operation failed for some reason. A simple but effective way to do this is simply to ensure that, if streaming of a response fails for any reason, then the message is updated with the final complete response text once it is available. This means that although the streaming experience is disrupted in the case of failure, there is no consistency problem with the final result once the response completes.


When appending without awaiting, it is possible for an intermediate append to fail while subsequent appends succeed. This creates a gap in the streamed response. For example, if a rate limit is exceeded, a single append may be rejected while the following tokens continue to be accepted.

To detect failures, keep a reference to each append operation and check for rejections after the stream completes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To detect failures, keep a reference to each append operation and check for rejections after the stream completes:
To detect append failures, keep a reference to each append operation and check for rejections after the stream completes:


### Handling publish failures <a id="publish-failures"/>

When publishing without awaiting, it is possible for an intermediate publish to fail while subsequent publishes succeed. This creates a gap in the streamed response. For example, if a rate limit is exceeded, a single token may be rejected while the following tokens continue to be accepted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest updating this text in a similar way to proposed above

@mschristensen mschristensen force-pushed the AIT-129-AIT-Docs-release-branch branch from 20df5cb to 01ab0f8 Compare January 16, 2026 16:50
Base automatically changed from AIT-129-AIT-Docs-release-branch to main January 16, 2026 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

10 participants