Send empty message right after first token generation (continuous batching) by dkalinowski · Pull Request #4020 · openvinotoolkit/model_server

dkalinowski · 2026-02-26T10:35:40Z

🛠 Summary

CVS-181341
CVS-177373

Copilot

Pull request overview

This PR implements support for sending an empty control message immediately after the first token generation in continuous batching scenarios. This addresses the case where the first token generation iteration produces no visible text output, allowing clients to receive an early signal that generation has started and includes the assistant role as required by the OpenAI streaming specification.

Changes:

Added loopIteration counter to track streaming iterations in GenAiServableExecutionContext
Implemented logic to send a control chunk when the first iteration produces empty text
Added serializeStreamingFirstTokenControlChunk() method to create properly formatted first-chunk responses for both chat completions and completions endpoints

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
src/llm/servable.hpp	Adds `loopIteration` field to track which streaming iteration is currently being processed
src/llm/servable.cpp	Implements logic to send control chunk on first iteration when text is empty, increments loop counter
src/llm/apis/openai_completions.hpp	Declares new method for serializing the first token control chunk
src/llm/apis/openai_completions.cpp	Implements serialization of control chunk with role field and null content for OpenAI spec compliance

src/llm/servable.cpp

src/llm/apis/openai_completions.cpp

src/llm/servable.cpp

mzegla · 2026-03-03T09:37:45Z

src/llm/apis/openai_completions.cpp

    return buffer.GetString();
 }
+
+std::string OpenAIChatCompletionsHandler::serializeStreamingFirstTokenControlChunk() {


I don't understand the context of FirstTokenControlChunk
what is the "control" aspect here?

mzegla · 2026-03-03T09:38:34Z

src/llm/apis/openai_completions.cpp

+    choice.SetObject();
+
+    choice.AddMember("index", 0, allocator);
+    if (endpoint == Endpoint::CHAT_COMPLETIONS) {


I think we could document this behavior maybe in the API reference, so it's clear that we send that empty response (and only for CB pipelines right?)

mzegla · 2026-03-03T09:43:54Z

src/llm/servable.hpp

    std::shared_ptr<ov::genai::TextStreamer> textStreamer;
    bool sendLoopbackSignal = false;
    std::string lastStreamerCallbackOutput;
+    size_t loopIteration = 0;


This name does not explain the purpose to me. Also, couldn't this be a bool like decodingPhase? Or even an enum like RequestProcessingPhase.prefill / RequestProcessingPhase.decode - starting with prefill and switching to decode after first read finishes.

mzegla · 2026-03-03T09:44:51Z

src/test/llm/llmnode_test.cpp


+// Reusable helper: asserts that a streaming chat completion chunk is the initial
+// initial empty message with role:assistant and content:null.
+inline void assertInitialStreamChatCompletionChunk(const std::string& response, const std::string& expectedModel) {


How about test for completion endpoint?

michalkulakowski · 2026-03-03T13:45:52Z

src/test/llm/llmnode_test.cpp

+            assertInitialStreamChatCompletionChunk(response, params.modelName);
+            return;
+        }
+        replyCounter++;


not needed?

save

7f29b46

Copilot AI review requested due to automatic review settings February 26, 2026 10:35

dkalinowski added the WIP Do not merge until resolved label Feb 26, 2026

Copilot started reviewing on behalf of dkalinowski February 26, 2026 10:36 View session

Copilot AI reviewed Feb 26, 2026

View reviewed changes

src/llm/servable.cpp Outdated Show resolved Hide resolved

src/llm/apis/openai_completions.cpp Show resolved Hide resolved

src/llm/apis/openai_completions.cpp Show resolved Hide resolved

src/llm/servable.cpp Outdated Show resolved Hide resolved

dkalinowski added 2 commits March 2, 2026 12:14

Merge remote-tracking branch 'origin/main' into ttft2

46d37ee

fix

f545fbb

dkalinowski removed the WIP Do not merge until resolved label Mar 2, 2026

dkalinowski requested review from dtrawins, michalkulakowski and mzegla March 2, 2026 13:15

mzegla reviewed Mar 3, 2026

View reviewed changes

michalkulakowski reviewed Mar 3, 2026

View reviewed changes

michalkulakowski approved these changes Mar 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Send empty message right after first token generation (continuous batching)#4020

Send empty message right after first token generation (continuous batching)#4020
dkalinowski wants to merge 3 commits intomainfrom
ttft2

dkalinowski commented Feb 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mzegla Mar 3, 2026

Uh oh!

mzegla Mar 3, 2026

Uh oh!

mzegla Mar 3, 2026

Uh oh!

mzegla Mar 3, 2026

Uh oh!

michalkulakowski Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dkalinowski commented Feb 26, 2026

🛠 Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mzegla Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

mzegla Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

mzegla Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

mzegla Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

michalkulakowski Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants