Skip to content

Improve HTML report rendering, tool call display, and fix empty schemas (#48, #49, #50)#51

Merged
richardkiene merged 2 commits into
mainfrom
feature/report-improvements-48-49
Jan 23, 2026
Merged

Improve HTML report rendering, tool call display, and fix empty schemas (#48, #49, #50)#51
richardkiene merged 2 commits into
mainfrom
feature/report-improvements-48-49

Conversation

@richardkiene
Copy link
Copy Markdown
Contributor

@richardkiene richardkiene commented Jan 23, 2026

Summary

  • Render Judge LLM reasoning and Synthetic User messages as Markdown using the existing marked.js library
  • Categorize tool calls into Required, Optional, and Unexpected sections with clear visual indicators
  • Fix MCP server run_scenario returning empty tool schemas

Changes

Markdown Rendering (#48)

  • Changed Judge LLM reasoning from plain text <p> tags to <div class="markdown-content"> for proper markdown rendering
  • Changed Synthetic User (role="user") turns to use markdown rendering, same as agent under test output

Tool Call Categorization (#49)

  • Rewrote _build_tool_calls_html to categorize tools by:
    • Required - tools that must be called (shows pass/fail status and missing tools)
    • Optional - tools from the scenario's optional_tools list
    • Unexpected - any other tools (highlighted with yellow background)
  • Added optional_tools to the judgment result's tool_usage_dict so the report has access to it
  • Added CSS styling for tool categories with visual differentiation

MCP Server Tool Schema Fix (#50)

  • Added _extract_tool_schemas helper function that:
    1. First tries to extract schemas from mcp_server config (most reliable)
    2. Falls back to agent.get_available_tools() for ADK agents without mcp_server config
  • This fixes reports generated via MCP server showing empty {} schemas

Test plan

  • All 255 unit tests pass
  • Ruff linting passes
  • Mypy type checking passes
  • Manual verification with ADK agent - tool schemas now display correctly

Closes #48
Closes #49
Closes #50

- Render Judge LLM reasoning and Synthetic User messages as Markdown
  using the existing marked.js library (same as agent under test output)
- Categorize tool calls into Required, Optional, and Unexpected sections
- Show missing required tools with clear visual indicators
- Add optional_tools to judgment results for proper categorization
- Style improvements for tool category sections with pass/fail states
Extract MCP tool schemas from the server config instead of only from
the agent. SimpleLLMAgent.get_available_tools() returns an empty list,
which caused reports generated via MCP server to show empty schemas.

- Add _extract_tool_schemas helper that tries mcp_server config first
- Fall back to agent.get_available_tools() for ADK agents without
  mcp_server config
@richardkiene richardkiene merged commit d0aa1a8 into main Jan 23, 2026
3 checks passed
@richardkiene richardkiene deleted the feature/report-improvements-48-49 branch January 23, 2026 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant