Skip to content

AgentDojo: Snowl Monitor Trial detail trace rendering #10

@CosmosYi

Description

@CosmosYi

Summary

When running real AgentDojo banking cases through Snowl with a ReAct-style agent and an OpenAI-compatible model provider, the run can complete successfully, but the web monitor trial detail panel may show incomplete agent steps.

In particular, some steps render as:

Thought: ...
Action: -
Observation: -

even though the underlying runtime events contain useful model output, tool calls, tool results, or JSON fallback actions.

Problem

The trial detail UI appears to miss several valid trace shapes:

  • Important fields may be nested under payload.step, payload.model_input, payload.model_output, payload.mode, or payload.direction.
  • Some providers expose reasoning text as reasoning_content instead of normal message content.
  • JSON fallback outputs such as tool calls or final answers are not consistently parsed into action/observation rows.
  • Non-JSON fallback violations are not clearly shown as format retry rows, so they look like empty steps.

Example JSON fallback model output:

{
  "action": "tool_call",
  "tool_name": "get_most_recent_transactions",
  "arguments": {"n": 50}
}

Expected UI rendering:

Action
tool_call get_most_recent_transactions({"n": 50})

Observation
tool_result: [...]

For a final answer:

{
  "action": "final",
  "answer": "Your total spending in March 2022 was $1050.00."
}

Expected UI rendering:

Action
final answer

Observation
Your total spending in March 2022 was $1050.00.

Local fix tested

Locally, I fixed the trial detail rendering by:

  • Reading nested event payload fields through helper accessors.
  • Using message.reasoning_content ?? message.content for thought text.
  • Parsing JSON fallback tool_call outputs.
  • Parsing JSON fallback final outputs.
  • Rendering non-JSON fallback output as format_retry (model returned non-JSON text).
  • Aligning observations with the corresponding action row.

After this, the same AgentDojo banking cases showed complete, inspectable trajectories.

Suggested tests

  • Trial detail renders action/observation from nested payload.step.
  • Trial detail uses reasoning_content when available.
  • Trial detail parses JSON fallback tool_call.
  • Trial detail parses JSON fallback final.
  • Trial detail shows non-JSON fallback output as an explicit format retry event.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions