Skip to content

Fix tool call ordering in transcript and add timestamps (#54)#55

Merged
richardkiene merged 1 commit into
mainfrom
fix/tool-call-ordering-54
Jan 23, 2026
Merged

Fix tool call ordering in transcript and add timestamps (#54)#55
richardkiene merged 1 commit into
mainfrom
fix/tool-call-ordering-54

Conversation

@richardkiene
Copy link
Copy Markdown
Contributor

@richardkiene richardkiene commented Jan 23, 2026

Problem

The judge LLM was incorrectly detecting data fabrication because the conversation transcript showed the agent's text response BEFORE the tool call results.

What the judge saw (incorrect ordering):

[ASSISTANT]: The revenue is $1.5M based on the data...
  -> Tool call: get_revenue()
     Result: {"revenue": 1500000}

The judge thought: "The agent stated $1.5M before the tool returned that value - fabrication!"

Solution

  1. Add timestamps to ToolCall model (called_at, responded_at)
  2. Update ADK agent to capture precise timing of tool calls
  3. Reorder transcript to show tool calls before response text
  4. Add timestamps to all transcript entries

New format (correct ordering with timestamps):

[ASSISTANT] @ 10:30:45.123:
  -> Tool called @ 10:30:45.100: get_revenue({})
     Result @ 10:30:45.200: {"revenue": 1500000}
  Response: The revenue is $1.5M based on the data...

Now it's unambiguous that:

  • Tool was called at 10:30:45.100
  • Tool responded at 10:30:45.200
  • Agent synthesized response using those results

Backwards Compatibility

  • New called_at and responded_at fields are optional (float | None = None)
  • Existing stored results load successfully (Pydantic uses None defaults)
  • Timestamp formatter handles None by displaying "N/A"

Test plan

  • All 255 unit tests pass
  • Ruff linting passes
  • Mypy type checking passes
  • Backwards compatibility verified - won't break report generation

Closes #54

The judge was incorrectly detecting data fabrication because the
transcript showed agent response text before tool call results.

Changes:
- Add called_at and responded_at timestamps to ToolCall model
- Update ADK agent to capture tool call timestamps
- Reorder transcript to show tool calls before response text
- Add timestamps to all transcript entries for clarity

New transcript format:
  [ASSISTANT] @ 10:30:45.123:
    -> Tool called @ 10:30:45.100: get_revenue({})
       Result @ 10:30:45.200: {"revenue": 1500000}
    Response: The revenue is $1.5M based on the data...

This makes it unambiguous that the agent received tool results
before generating its response.
@richardkiene richardkiene merged commit fb5ba36 into main Jan 23, 2026
3 checks passed
@richardkiene richardkiene deleted the fix/tool-call-ordering-54 branch January 23, 2026 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Judge sees tool results after agent response text, causing false fabrication detection

1 participant