Fix tool call ordering in transcript and add timestamps (#54) by richardkiene · Pull Request #55 · Liquescent-Development/mcprobe

richardkiene · 2026-01-23T21:54:59Z

Problem

The judge LLM was incorrectly detecting data fabrication because the conversation transcript showed the agent's text response BEFORE the tool call results.

What the judge saw (incorrect ordering):

[ASSISTANT]: The revenue is $1.5M based on the data...
  -> Tool call: get_revenue()
     Result: {"revenue": 1500000}

The judge thought: "The agent stated $1.5M before the tool returned that value - fabrication!"

Solution

Add timestamps to ToolCall model (called_at, responded_at)
Update ADK agent to capture precise timing of tool calls
Reorder transcript to show tool calls before response text
Add timestamps to all transcript entries

New format (correct ordering with timestamps):

[ASSISTANT] @ 10:30:45.123:
  -> Tool called @ 10:30:45.100: get_revenue({})
     Result @ 10:30:45.200: {"revenue": 1500000}
  Response: The revenue is $1.5M based on the data...

Now it's unambiguous that:

Tool was called at 10:30:45.100
Tool responded at 10:30:45.200
Agent synthesized response using those results

Backwards Compatibility

New called_at and responded_at fields are optional (float | None = None)
Existing stored results load successfully (Pydantic uses None defaults)
Timestamp formatter handles None by displaying "N/A"

Test plan

All 255 unit tests pass
Ruff linting passes
Mypy type checking passes
Backwards compatibility verified - won't break report generation

Closes #54

The judge was incorrectly detecting data fabrication because the transcript showed agent response text before tool call results. Changes: - Add called_at and responded_at timestamps to ToolCall model - Update ADK agent to capture tool call timestamps - Reorder transcript to show tool calls before response text - Add timestamps to all transcript entries for clarity New transcript format: [ASSISTANT] @ 10:30:45.123: -> Tool called @ 10:30:45.100: get_revenue({}) Result @ 10:30:45.200: {"revenue": 1500000} Response: The revenue is $1.5M based on the data... This makes it unambiguous that the agent received tool results before generating its response.

richardkiene merged commit fb5ba36 into main Jan 23, 2026
3 checks passed

richardkiene deleted the fix/tool-call-ordering-54 branch January 23, 2026 22:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix tool call ordering in transcript and add timestamps (#54)#55

Fix tool call ordering in transcript and add timestamps (#54)#55
richardkiene merged 1 commit into
mainfrom
fix/tool-call-ordering-54

richardkiene commented Jan 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

richardkiene commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Backwards Compatibility

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

richardkiene commented Jan 23, 2026 •

edited

Loading