-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Description
Bug Report
Package: azure-ai-agentserver-langgraph
Version: 1.0.0b8
Summary
When a LangGraph interrupt is pending (e.g. a HITL form or confirmation is awaiting a response) and the client sends a regular human message instead of the expected function_call_output response, _convert_request_input_with_history takes the checkpoint path, which skips _filter_incomplete_tool_calls. LangGraph then merges the new HumanMessage into the checkpoint state, producing a message sequence like:
[SystemMessage, HumanMessage, AIMessage(tool_calls=[...]), HumanMessage]
OpenAI rejects this with HTTP 400:
An assistant message with 'tool_calls' must be followed by tool messages responding
to each 'tool_call_id'. The following tool_call_ids did not have response messages:
call_F4JleIVATKlDQ0CGvrDOf9T5
Root Cause
In response_api_default_converter.py, _convert_request_input_with_history applies _filter_incomplete_tool_calls only on the no-checkpoint path (when historical items are fetched from AIProjectClient):
# No checkpoint path — filter IS applied ✓
messages = self._filter_incomplete_tool_calls(messages)
# Checkpoint path — filter is NOT applied ✗
has_checkpoint = prev_state is not None and prev_state.values is not None and len(prev_state.values) > 0
if has_checkpoint:
return current_input # ← returns bare HumanMessage, no filteringThe checkpoint path returns current_input (the new HumanMessage) trusting the checkpointed state to be clean. However, when an interrupt is pending the checkpoint contains an AIMessage with tool_calls and no corresponding ToolMessage. Merging a new HumanMessage onto that state creates an invalid sequence.
Steps to Reproduce
- Build a LangGraph agent that uses
interrupt()inside a tool (e.g. to show a form). - Send a message that triggers the tool → the agent returns a HITL interrupt.
- Without responding to the form, send a new human message (e.g. the user types something new).
HumanInTheLoopJsonHelper._validate_input_formatlogs"Invalid interrupt input item type: None, expected FUNCTION_CALL_OUTPUT"and returnsNone._convert_request_input_with_historyseeshas_checkpoint=Trueand returns only the newHumanMessage.- LangGraph merges
[..., AIMessage(tool_calls=[...]), HumanMessage]→ OpenAI 400.
Evidence from Application Logs (App Insights)
11:16:56 FunctionCallArgumentEventGenerator did not process message: Interrupt(value={'action': 'form', ...})
11:17:05 Checkpoint found for conversation conv_8bdcdbb1a8e1dfe8009..., using existing state
11:17:05 Invalid interrupt input item type: None, expected FUNCTION_CALL_OUTPUT.
11:17:05 Retrieved interrupt from state, validating and converting human feedback.
→ graph invoked with HumanMessage merged into interrupted checkpoint
→ OpenAI 400: tool_call_id call_F4JleIVATKlDQ0CGvrDOf9T5 has no response
Expected Behaviour
When a non-HITL message is received while an interrupt is pending, _convert_request_input_with_history should either:
Option A (minimal fix): Apply _filter_incomplete_tool_calls to the merged message list before returning, even on the checkpoint path.
Option B (explicit handling): When has_interrupt(state)=True but validate_and_convert_human_feedback returns None, return an error to the client (e.g. a 400 or a synthetic agent message) rather than proceeding with the corrupt message list.
Workaround
We are currently working around this in our own llm_call node by filtering before every LLM invocation:
messages = _filter_incomplete_tool_call_sequences(messages)
response = llm.invoke(messages)where _filter_incomplete_tool_call_sequences mirrors the logic of the SDK's _filter_incomplete_tool_calls.
Environment
azure-ai-agentserver-langgraph==1.0.0b8azure-ai-agentserver-core==1.0.0b8langgraph==0.3.18langchain-openai==0.3.9- Azure AI Foundry hosted agent (Linux, multi-instance,
MemorySavercheckpointer)