Skip to content

update: ai-run-summary#123

Merged
ppetrovicTT merged 12 commits into
tenstorrent:mainfrom
ipastalTT:ipastalTT/update-ai-run-summary
Jun 17, 2026
Merged

update: ai-run-summary#123
ppetrovicTT merged 12 commits into
tenstorrent:mainfrom
ipastalTT:ipastalTT/update-ai-run-summary

Conversation

@ipastalTT

@ipastalTT ipastalTT commented May 6, 2026

Copy link
Copy Markdown
Contributor

Scope

update ai-run-summary module

What Changed

  • Included optional commit SHAs for tt-metal, tt-inference-server and vLLM.
  • Added succesful jobs tab.

Testing

Run on nightly at tt-shield from this branch with a specified run id (skips building images, downloads artifacts from that run, re-uploads and runs ai-run-summary)

New Look

AI Run Summary

Run: 26156616304 · Date: 2026-05-20

TT-Metal: c5beb19 · tt-inference-server: 18ec154 · vLLM: bfce692

Critically unhealthy run with 74% failure rate (57/77 jobs) spanning crashes, test failures, and infrastructure issues across nearly all hardware configurations.

Job Status Overview

Status Count Distribution
🟣 INFRA_FAILURE 8 ██░░░░░░░░░░░░░░░░░░ 10%
🔴 CRASHED 34 █████████░░░░░░░░░░░ 44%
🔴 TIMEOUT 1 ░░░░░░░░░░░░░░░░░░░░ 1%
🟠 TESTS_FAILED 11 ███░░░░░░░░░░░░░░░░░ 14%
🟡 EVALS_BELOW_TARGET 3 █░░░░░░░░░░░░░░░░░░░ 4%
🟢 SUCCESS 20 █████░░░░░░░░░░░░░░░ 26%

Failure Category Distribution

Category Jobs Distribution Subcategories
tt-metal 21 ████░░░░░░░░ 37% fabric 9 · trace 4 · dispatch 3 · memory 3 · device 2
vllm 9 ██░░░░░░░░░░ 16% engine 7 · config 2
infra 8 ██░░░░░░░░░░ 14% no_logs 8
app 8 ██░░░░░░░░░░ 14% api 6 · server 2
model 6 █░░░░░░░░░░░ 11% accuracy 5 · load 1
hw 3 █░░░░░░░░░░░ 5% fabric 3
runtime 2 ░░░░░░░░░░░░ 4% exception 2

Dominant Failure Pattern

TT-Metal subsystem failures (fabric topology mapping, trace buffer sizing, DRAM exhaustion, and dispatch timeouts) combined with vLLM engine startup failures (missing _processor_factory attribute, empty model name assertions) are causing cascading server unavailability and downstream eval/benchmark timeouts.

Failed Job Details (57)
Job Run Status Category Root Cause
76637859441 Wan2.2-I2V-A14B-Diffusers (galaxy) 🟣 INFRA_FAILURE infra:no_logs Missing log dirs: ['tt-inference-server/workflow_logs/run_logs', 'tt-inference-server/work…
76637859444 Motif-Image-6B-Preview (galaxy) 🟣 INFRA_FAILURE infra:no_logs Missing log dirs: ['tt-inference-server/workflow_logs/run_logs', 'tt-inference-server/work…
76637859522 Wan2.2-T2V-A14B-Diffusers (galaxy) 🟣 INFRA_FAILURE infra:no_logs Missing log dirs: ['tt-inference-server/workflow_logs/run_logs', 'tt-inference-server/work…
76637859526 bge-large-en-v1.5 (galaxy) 🟣 INFRA_FAILURE infra:no_logs Missing log dirs: ['tt-inference-server/workflow_logs/run_logs', 'tt-inference-server/work…
76637859552 bge-m3 (galaxy) 🟣 INFRA_FAILURE infra:no_logs Missing log dirs: ['tt-inference-server/workflow_logs/run_logs', 'tt-inference-server/work…
76637859558 whisper-large-v3 (galaxy) 🟣 INFRA_FAILURE infra:no_logs Missing log dirs: ['tt-inference-server/workflow_logs/run_logs', 'tt-inference-server/work…
76637859586 stable-diffusion-xl-base-1.0 (galaxy) 🟣 INFRA_FAILURE infra:no_logs Missing log dirs: ['tt-inference-server/workflow_logs/run_logs', 'tt-inference-server/work…
76637859591 stable-diffusion-3.5-large (galaxy) 🟣 INFRA_FAILURE infra:no_logs Missing log dirs: ['tt-inference-server/workflow_logs/run_logs', 'tt-inference-server/work…
76637859567 mochi-1-preview (t3k) 🔴 TIMEOUT app:api The /tt-liveness endpoint on the inference server is returning HTTP 405 (Method Not Allowe…
76637859587 whisper-large-v3 (galaxy) 🔴 CRASHED app:server The inference server's /tt-liveness endpoint is returning HTTP 405 Method Not Allowed in r…
76636001501 Llama-3.1-8B-Instruct (t3k) 🔴 CRASHED hw:fabric During benchmark run 8/17 (isl-2048, osl-128, max-concurrency=32, 128 prompts), the Engine…
76636001557 Llama-3.2-3B-Instruct (t3k) 🔴 CRASHED hw:fabric During benchmark run 12/17 (ISL=8192, max-concurrency=15), the EngineCore_DP0 process cras…
76636001596 Qwen3-32B (t3k) 🔴 CRASHED hw:fabric The EngineCore (EngineCore_DP0) crashed with a fatal RuntimeError 'Timeout waiting for Eth…
76637859432 Motif-Image-6B-Preview (p300x2) 🔴 CRASHED model:load The Motif pipeline creation failed with a KeyError because the mesh device shape (2, 2) is…
76636001502 Llama-3.1-8B-Instruct (galaxy) 🔴 CRASHED runtime:exception EngineCore_DP0 crashed with an IndexError in `sampling_module.seed_manager.apply_slot_rema…
76637859576 stable-diffusion-xl-base-1.0 (galaxy) 🔴 CRASHED runtime:exception The SDXL model warmup failed with AssertionError because TT_METAL_CORE_GRID_OVERRIDE_TODEP…
76637859450 Wan2.2-I2V-A14B-Diffusers (galaxy) 🔴 CRASHED tt-metal:device The inference server crashed during device initialization because the configured MeshShape…
76637859521 Wan2.2-T2V-A14B-Diffusers (galaxy) 🔴 CRASHED tt-metal:device The inference server crashed during MeshDevice initialization because the configured mesh …
76636001472 Llama-3.1-8B-Instruct (p100) 🔴 CRASHED tt-metal:dispatch The TT-Metal device timed out during dispatch command queue initialization while the vLLM …
76636001575 gpt-oss-20b (t3k) 🔴 CRASHED tt-metal:dispatch The TT-Metal dispatch layer timed out during model warmup/prefill in the EngineCore_DP0 pr…
76636001618 gpt-oss-120b (t3k) 🔴 CRASHED tt-metal:dispatch During vLLM EngineCore (EngineCore_DP0) warmup/prefill of gpt-oss-120b on T3K, a dispatch …
76636001453 DeepSeek-R1-0528 (quad_galaxy) 🔴 CRASHED tt-metal:fabric Multi-host fabric topology mapping failed during MeshDevice initialization: ranks 1, 2, an…
76636001485 Llama-3.1-8B-Instruct (p150x4) 🔴 CRASHED tt-metal:fabric The TT-Metal fabric topology mapper failed to map the logical mesh graph to the discovered…
76636001572 Llama-3.3-70B-Instruct (p150x4) 🔴 CRASHED tt-metal:fabric The vLLM EngineCore (EngineCore_DP0) crashed during MeshDevice initialization because the …
76636001587 Llama-3.3-70B-Instruct (galaxy) 🔴 CRASHED tt-metal:fabric The TT-Metal fabric topology mapper failed to map the logical mesh graph to the physical t…
76636001630 Qwen3-32B (galaxy) 🔴 CRASHED tt-metal:fabric The TT-Metal fabric topology mapper failed to map the logical mesh graph to the physical t…
76637859448 FLUX.1-dev (p150x4) 🔴 CRASHED tt-metal:fabric TT_FATAL in fabric topology mapper: the logical mesh graph (MGD) with 1 node could not be …
76637859453 FLUX.1-schnell (p150x4) 🔴 CRASHED tt-metal:fabric TT_FATAL in fabric topology mapper: the logical mesh graph (MGD) could not be mapped to th…
76637859559 mochi-1-preview (galaxy) 🔴 CRASHED tt-metal:fabric TT_FATAL in tt-metal fabric topology mapper: the logical mesh graph (MGD) could not be map…
76637859561 mochi-1-preview (p150x4) 🔴 CRASHED tt-metal:fabric TT_FATAL in tt-metal fabric topology mapper: the logical mesh graph (MGD) could not be map…
76636001522 Qwen2.5-72B-Instruct (galaxy) 🔴 CRASHED tt-metal:memory During vLLM engine core initialization (EngineCore_DP0), a TT_FATAL out-of-memory error oc…
76636001537 Qwen2.5-72B-Instruct (t3k) 🔴 CRASHED tt-metal:memory During vLLM EngineCore startup, a DRAM buffer allocation of 134217728 B (128 MB) failed in…
76636001589 Llama-3.2-3B-Instruct (n150) 🔴 CRASHED tt-metal:memory During a prefill operation with 32 concurrent requests, the TT-Metal DRAM allocator ran ou…
76636001552 Llama-3.3-70B-Instruct (t3k) 🔴 CRASHED tt-metal:trace The trace region allocated for MeshDevice 0 is only 30000000B (30MB) but the prefill trace…
76636001554 Llama-3.2-3B-Instruct (n300) 🔴 CRASHED tt-metal:trace The trace buffer required for prefill capture (53100544B ≈ 50.6MB) exceeds the allocated t…
76636001559 QwQ-32B (galaxy) 🔴 CRASHED tt-metal:trace The vLLM engine (EngineCore_DP0) crashed during model warmup because the trace buffer requ…
76636001594 Qwen3-8B (t3k) 🔴 CRASHED tt-metal:trace The trace buffer required for prefill capture (51101696B) exceeds the allocated trace regi…
76636001538 Mistral-Small-3.1-24B-Instruct-2503 (t3k) 🔴 CRASHED vllm:config The vLLM server failed to start because 'mistralai/Mistral-Small-3.1-24B-Instruct-2503' ha…
76636001459 Llama-3.2-11B-Vision-Instruct (t3k) 🔴 CRASHED vllm:engine The vLLM server crashed during startup because the Llama-3.2-11B-Vision-Instruct model cla…
76636001483 Llama-3.2-11B-Vision-Instruct (n300) 🔴 CRASHED vllm:engine The vLLM server failed to start because the Llama-3.2-11B-Vision-Instruct model class is m…
76636001525 Mistral-7B-Instruct-v0.3 (n150) 🔴 CRASHED vllm:engine The vLLM EngineCore failed to start because the model name specified in the vLLM configura…
76636001541 Mistral-7B-Instruct-v0.3 (n300) 🔴 CRASHED vllm:engine The vLLM EngineCore failed to start because an assertion in generator_vllm.py detected a m…
76636001566 Llama-3.2-90B-Vision-Instruct (t3k) 🔴 CRASHED vllm:engine The vLLM server crashed during startup because the Llama-3.2-90B-Vision-Instruct model cla…
76637859436 Motif-Image-6B-Preview (t3k) 🟠 TESTS_FAILED app:api The /tt-liveness endpoint on the inference server is returning HTTP 405 Method Not Allowed…
76637859451 Motif-Image-6B-Preview (p150x8) 🟠 TESTS_FAILED app:api The /tt-liveness endpoint on the inference server is returning HTTP 405 Method Not Allowed…
76637859463 FLUX.1-schnell (t3k) 🟠 TESTS_FAILED app:api The inference server is running and reachable on port 8000, but the /tt-liveness endpoin…
76637859532 Wan2.2-T2V-A14B-Diffusers (p150x8) 🟠 TESTS_FAILED app:api The inference server's /tt-liveness endpoint is returning HTTP 405 (Method Not Allowed) …
76637859581 stable-diffusion-3.5-large (t3k) 🟠 TESTS_FAILED app:api The inference server's /tt-liveness endpoint is returning HTTP 405 (Method Not Allowed) fo…
76637859506 bge-m3 (t3k) 🟠 TESTS_FAILED app:server The BGEM3Runner class is missing the _run_async method, causing all embedding requests…
76636001492 Llama-3.1-8B-Instruct (p300x2) 🟠 TESTS_FAILED model:accuracy The vLLM server running Llama-3.1-8B-Instruct on p300x2 is not honoring the seed parameter…
76636001619 Qwen3-32B (p300x2) 🟠 TESTS_FAILED model:accuracy The model Qwen3-32B produces non-deterministic outputs when top_k=1 is set, with two ident…
76636001636 Llama-3.3-70B-Instruct (p150x8) 🟠 TESTS_FAILED model:accuracy The presence_penalty parameter failed to increase token diversity in the 'repeat_trap' tes…
76636001672 Llama-3.3-70B-Instruct (p300x2) 🟠 TESTS_FAILED model:accuracy The presence_penalty parameter (value 1.2) had no measurable effect on output length for t…
76636001623 Qwen2.5-VL-7B-Instruct (n300) 🟠 TESTS_FAILED vllm:config The vLLM server for Qwen2.5-VL-7B-Instruct is configured to accept at most 1 image per pro…
76636001547 Llama-3.2-1B-Instruct (t3k) 🟡 EVALS_BELOW_TARGET model:accuracy Llama-3.2-1B-Instruct on t3k failed acceptance criteria due to eval accuracy scores signif…
76636001634 Qwen2.5-VL-3B-Instruct (n300) 🟡 EVALS_BELOW_TARGET vllm:engine The vLLM server enforces a limit of at most 1 image per prompt for Qwen2.5-VL-3B-Instruct,…
76636001638 Qwen2.5-VL-3B-Instruct (n150) 🟡 EVALS_BELOW_TARGET vllm:engine The vLLM server enforces a limit of at most 1 image per prompt for Qwen2.5-VL-3B-Instruct,…
Successful Models (20)
Job Run Status
76637859434 FLUX.1-dev (p150x8) 🟢 SUCCESS
76637859437 FLUX.1-dev (p300x2) 🟢 SUCCESS
76637859455 FLUX.1-dev (t3k) 🟢 SUCCESS
76637859461 FLUX.1-schnell (p150x8) 🟢 SUCCESS
76637859433 FLUX.1-schnell (p300x2) 🟢 SUCCESS
76637859512 mochi-1-preview (galaxy) 🟢 SUCCESS
76637859569 mochi-1-preview (p150x8) 🟢 SUCCESS
76637859583 mochi-1-preview (p300x2) 🟢 SUCCESS
76636001579 Qwen2.5-VL-32B-Instruct (t3k) 🟢 SUCCESS
76636001610 Qwen2.5-VL-72B-Instruct (t3k) 🟢 SUCCESS
76636001613 Qwen3-VL-32B-Instruct (t3k) 🟢 SUCCESS
76637859566 stable-diffusion-xl-base-1.0 (p150x8) 🟢 SUCCESS
76637859575 stable-diffusion-xl-base-1.0 (p300x2) 🟢 SUCCESS
76637859509 Wan2.2-I2V-A14B-Diffusers (p150x4) 🟢 SUCCESS
76637859503 Wan2.2-I2V-A14B-Diffusers (p150x8) 🟢 SUCCESS
76637859596 Wan2.2-I2V-A14B-Diffusers (p300x2) 🟢 SUCCESS
76637859467 Wan2.2-I2V-A14B-Diffusers (t3k) 🟢 SUCCESS
76637859553 Wan2.2-T2V-A14B-Diffusers (p150x4) 🟢 SUCCESS
76637859536 Wan2.2-T2V-A14B-Diffusers (p300x2) 🟢 SUCCESS
76637859488 Wan2.2-T2V-A14B-Diffusers (t3k) 🟢 SUCCESS
Run Summary Stats
Metric Value
Total jobs 77
Failed 57
Passed 20
LLM model anthropic/claude-sonnet-4-6
Tokens 3676 + 198
LLM time 6807ms

Copilot AI review requested due to automatic review settings May 6, 2026 08:45
@ipastalTT ipastalTT requested a review from a team as a code owner May 6, 2026 08:45

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the ai-run-summary tool to enrich run reports with component commit SHAs and per-model expandable details by ingesting markdown sidecar artifacts.

Changes:

  • Add support for loading a *.md “sidecar” summary alongside each ai_job_summary_*.json artifact and rendering it in the run report.
  • Add commit SHA reporting for TT-Metal / tt-inference-server / vLLM (explicit CLI/action inputs with optional auto-fetch via gh).
  • Replace the previous failed-job details table with an alphabetical, expandable “Model Details” section for all jobs.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
.github/actions/ai_summary/tool/ai_run_summary/tests/test_parse.py Adds coverage for loading/omitting the markdown sidecar summary.
.github/actions/ai_summary/tool/ai_run_summary/tests/test_format.py Updates formatting expectations for commit SHAs + new “Model Details” output.
.github/actions/ai_summary/tool/ai_run_summary/tests/test_commits.py Adds tests for resolving commit SHAs from gh job logs.
.github/actions/ai_summary/tool/ai_run_summary/parse.py Loads *.md sidecar content into parsed summaries.
.github/actions/ai_summary/tool/ai_run_summary/models.py Extends ParsedJobSummary with a markdown field.
.github/actions/ai_summary/tool/ai_run_summary/format.py Adds commit SHA header rendering and new expandable per-model details blocks.
.github/actions/ai_summary/tool/ai_run_summary/commits.py Implements gh-based resolve-shas job discovery + SHA extraction.
.github/actions/ai_summary/tool/ai_run_summary/cli.py Wires commit SHA inputs/auto-fetch and passes all summaries into the formatter.
.github/actions/ai_summary/run/action.yml Adds action inputs/env wiring for explicit component commit SHAs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

md_path = file_path.with_suffix(".md")
try:
markdown = md_path.read_text() if md_path.exists() else ""
except OSError:
Comment on lines +178 to +196
emoji = STATUS_EMOJI.get(job.status, "")
model = _extract_run_label(job) or "\u2014"
url = _job_url(job, run_url)
label = job.job_id or job.source_file.stem.removeprefix("ai_job_summary_")
job_link = f'<a href="{url}">{label}</a>' if url else label

parts = [f"<strong>{model}</strong>", f"{emoji} {job.status}", job_link]

if not compact:
category = job.category or "UNKNOWN"
parts.append(f"<code>{category}</code>")

if show_root_cause:
root_cause = job.root_cause or ""
if len(root_cause) > _ROOT_CAUSE_COL_MAX:
root_cause = root_cause[:_ROOT_CAUSE_COL_MAX] + "\u2026"
parts.append(root_cause)

summary_line = " &nbsp;|&nbsp; ".join(parts)
import markdown

from .models import ParsedJobSummary, RunNarrative, RunStats, STATUS_EMOJI
from .models import NON_FAILURE_STATUSES, ParsedJobSummary, RunNarrative, RunStats, STATUS_EMOJI
vllm_commit: Optional vLLM commit SHA.
inference_server_commit: Optional tt-inference-server commit SHA.
all_summaries: All parsed job summaries (success + failure). When provided,
a Model Details section is appended after the failed job details.
Comment on lines +63 to +64
if not count_output.strip() or int(count_output.strip() or "0") < 100:
break
Comment on lines +86 to +96
# Extract "Full sha: <40-char-hex>" lines in order of appearance.
# The resolve-shas workflow resolves: tt-metal, inference-server, vllm — in that order.
shas: list[str] = []
for line in logs.splitlines():
if "Full sha:" in line:
parts = line.split("Full sha:")
if len(parts) == 2:
sha = parts[1].strip()
if sha and all(c in "0123456789abcdefABCDEF" for c in sha):
shas.append(sha)

Comment on lines +325 to +327
print(f"Fetching commit SHAs for run {run_id_for_commits}...", file=sys.stderr)
repo = os.environ.get("GITHUB_REPOSITORY", "tenstorrent/tt-shield")
commits = fetch_run_commits(int(run_id_for_commits), repo=repo)
Comment on lines 140 to 149
env:
TT_CHAT_API_KEY: ${{ inputs.api-key }}
TT_CHAT_URL: ${{ inputs.api-url }}
CONFIG: ${{ inputs.config }}
EXPECTED_JOBS: ${{ inputs.expected-jobs }}
RUN_RESULT: ${{ inputs.run-result }}
TT_METAL_COMMIT: ${{ inputs.tt-metal-commit }}
VLLM_COMMIT: ${{ inputs.vllm-commit }}
INFERENCE_SERVER_COMMIT: ${{ inputs.inference-server-commit }}
run: |
@ppetrovicTT ppetrovicTT self-requested a review May 12, 2026 10:44

@ppetrovicTT ppetrovicTT left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ipastalTT ! I'm glad you like the tool and want to improve it!

Several points:

  1. Let's not invoke gh cli on the currently ongoing run. It is prone to race conditions, can be hard to debug, and can introduce bugs. The workflow should provide all the data to action.

  2. The failed jobs were sorted in two-pass: First by severity - infra, crash, failed tests, failed evals. And then by category. That way it is easier to spot issues that span across different models - again, as these are more impactful, they should be easier to spot and act upon.

  3. I like your idea to have collapsable descriptions within this summary. But we have 1MB size limit, so I decided to keep them separate with only the root cause for easier comparison. Either way, the ai-run-summary is using .json file created by ai-job-summary. If you need only a certain field, let's make it separate in .json, so that it's easier extracted? Rather than stripping the .md files.

Thanks!

@ipastalTT

Copy link
Copy Markdown
Contributor Author

Hi @ppetrovicTT. Thanks for your feedback!

  • Regarding your first point gh api is used as a fallback in case the commit SHAs are not supplied to the workflow. I will remove the fallback logic.
  • I see, that makes sense. The reason I refactored it that way was that I thought it might be easier for the user to search their model. But of course this can be reverted. By the way we would like this report to be accessible here, so the user can quickly find the latest nightly report for their model. Do we need to upload it somewhere specific or is it already accessible there? Asking because I am not that familiar with the whole codebase, but assumed since it can be accessed through gh api, it could be accessed in that webpage backend as well.
  • On your third point, I think you are correct to mention using only the json file, since I believe the information is already present there. We can then render it the same way, without needing the md artifacts. Do you think we might exceed the 1MB limit if we provide more context ? Right now I see between 140-200KB for the generated files. I can understand that more verbosity and adding new models might increase its footprint. This change was made in relation to the previous point, instead of the user navigating on the run url to see more details, they can see what went wrong for various models from the same markdown document. What do you think on that, should this be reverted?

@ppetrovicTT

Copy link
Copy Markdown
Contributor
  • I see, that makes sense. The reason I refactored it that way was that I thought it might be easier for the user to search their model. But of course this can be reverted. By the way we would like this report to be accessible here, so the user can quickly find the latest nightly report for their model. Do we need to upload it somewhere specific or is it already accessible there? Asking because I am not that familiar with the whole codebase, but assumed since it can be accessed through gh api, it could be accessed in that webpage backend as well.

Keeping the summaries in S3 should not be this tool's job.
tt-shield has a separate workflow that pushes them to S3.
@acvejicTT can help with that.

  • On your third point, I think you are correct to mention using only the json file, since I believe the information is already present there. We can then render it the same way, without needing the md artifacts. Do you think we might exceed the 1MB limit if we provide more context ? Right now I see between 140-200KB for the generated files. I can understand that more verbosity and adding new models might increase its footprint. This change was made in relation to the previous point, instead of the user navigating on the run url to see more details, they can see what went wrong for various models from the same markdown document. What do you think on that, should this be reverted?

Yes, let's please avoid this - we hope for the number of models to grow. In tt-shield nightly, currenty only the models with inference server are supported. We need to add the summary for media-server as well. Since we already keep the summaries as separate files, let's just refer to them? In the TT Models dashboard, I believe we can re-link to proper jobs. We have the job IDs in the name, should be easy to find.

A few more things:

  1. Please keep in mind that this same action is now used in both tt-shield and tt-metal.
    The usecases are quite different (one repo vs cross-repo, different logs structures, folders, etc.)
  2. It's very easy to expand the action API and add stuff. While developing the early version, I already had 2 passes of cleaning it! If we can be smart about fetching the commit SHAs, but not going through the API, that would be great.

Thanks!

ipastalTT and others added 4 commits May 20, 2026 11:44
…ranch with new commit SHA inputs\n - Pass tt-metal, vllm, and inference-server SHAs to the action
- Restore the Failed Job Details table removed in e5d7d24
- Remove _job_expandable_block helper and Model Details <details> block
- Remove all_summaries parameter from format_run_report
- Delete commits.py and its tests (gh CLI auto-fetch of SHAs)
- Drop --run-id arg and RunCommits/fetch_run_commits from cli.py
- Pass SHA args directly from CLI args to format_run_report
- Restore test_format.py to origin/main baseline + add SHA-header test

Co-authored-by: Cursor <cursoragent@cursor.com>
@ipastalTT ipastalTT force-pushed the ipastalTT/update-ai-run-summary branch from c042547 to 39f4fb6 Compare May 20, 2026 09:44
Copilot AI review requested due to automatic review settings May 20, 2026 09:52
- Drop markdown field from ParsedJobSummary (no consumer left)
- Remove .md sidecar file reading from parse_json_summary
- Remove sidecar tests from test_parse.py
- Fix long line in test_format.py SHA-header test
@ipastalTT ipastalTT force-pushed the ipastalTT/update-ai-run-summary branch from dcbdffa to 1d74e04 Compare May 20, 2026 10:00

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

- Add successful_jobs field to RunStats (populated in aggregate.py)
- Render collapsed Successful Models block in format.py sorted alphabetically with job links
- Add tests in test_aggregate.py and test_format.py
@ipastalTT ipastalTT requested a review from ppetrovicTT May 20, 2026 10:34
@ipastalTT

ipastalTT commented May 20, 2026

Copy link
Copy Markdown
Contributor Author

Hi @ppetrovicTT , I reverted most of the changes.
Now the commit SHAs are optional, they won't be included if the args are not provided.
I added successful models as well in a different tab, I don't know if you agree with that.
Do you think we should try to find which models do not have ai job summaries and mark them as missing or something? Might help the user.

the commit SHAs can be provided like this in tt-shield nightly.yaml.
here

  ai-run-summary:
    name: "AI Run Summary"
    needs: [resolve-shas, call-dynamic-workflow]
.
.
.
          tt-metal-commit: ${{ needs.resolve-shas.outputs.tt-metal-sha }}
          vllm-commit: ${{ needs.resolve-shas.outputs.vllm-sha }}
          inference-server-commit: ${{ needs.resolve-shas.outputs.inference-server-sha }}

Thanks!

)
md += f"<details>\n<summary>Successful Models ({len(sorted_success)})</summary>\n\n"
for job in sorted_success:
label = _extract_run_label(job) or "\u2014"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please have the same table here as for the failed jobs above?
without the category and root cause of course.

@ppetrovicTT

Copy link
Copy Markdown
Contributor

Hi @ppetrovicTT , I reverted most of the changes. Now the commit SHAs are optional, they won't be included if the args are not provided. I added successful models as well in a different tab, I don't know if you agree with that.

Cool for the SHAs.
I agree with the list of successful jobs - they're visible in Github, but not if we look at reports elsewhere (dashboard).

Do you think we should try to find which models do not have ai job summaries and mark them as missing or something? Might help the user.

ai-run-summary already supports expected-jobs input - link. Sometimes we need some gymnastics to gather all the jobs that were intended to run (in tt-metal there are often filters, nested jobs, etc.) but Claude often helps there. It's not implemented in Models CI though, should be added in the workflow call (not in the action).

@ipastalTT ipastalTT requested review from Copilot and ppetrovicTT May 22, 2026 12:03

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

stats.status_counts[job.status] = stats.status_counts.get(job.status, 0) + 1

stats.failed_jobs = [j for j in summaries if j.status not in NON_FAILURE_STATUSES]
stats.successful_jobs = [j for j in summaries if j.status in NON_FAILURE_STATUSES]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in new commit

Comment on lines +223 to +234
versions = []
if tt_metal_commit:
short = tt_metal_commit[:7]
url = f"https://github.com/tenstorrent/tt-metal/commit/{tt_metal_commit}"
versions.append(f"**TT-Metal**: [`{short}`]({url})")
if inference_server_commit:
short = inference_server_commit[:7]
url = f"https://github.com/tenstorrent/tt-inference-server/commit/{inference_server_commit}"
versions.append(f"**tt-inference-server**: [`{short}`]({url})")
if vllm_commit:
short = vllm_commit[:7]
url = f"https://github.com/tenstorrent/vllm/commit/{vllm_commit}"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in new commit

Comment on lines +355 to +364
sorted_success = sorted(
stats.successful_jobs,
key=lambda j: (_extract_run_label(j) or "").lower(),
)
md += f"<details>\n<summary>Successful Models ({len(sorted_success)})</summary>\n\n"
md += "| Job | Run | Status |\n"
md += "|-----|-----|--------|\n"
for job in sorted_success:
job_cell = _job_id_cell(job, run_url)
model = _extract_run_label(job) or "\u2014"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in new commit

stats = compute_stats(jobs)
assert len(stats.successful_jobs) == 2
assert all(j.status == "SUCCESS" for j in stats.successful_jobs)

@ppetrovicTT

ppetrovicTT commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Just getting back to this, sorry.

@ipastalTT , do you think we can make this interface a bit more generic?
for example, provide a json {"repo", "commit"}, and auto-fill in the fields?
the field can be called just "commits".

so if it's only tt-metal, we print that.. if there are more we print them all.. but don't hardcode to tt-metal, vllm, tt-inference-server.

otherwise, good to go!
thanks!

@ipastalTT

ipastalTT commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

Hi @ppetrovicTT , great advice!

Would a json structure such as {"repo": "tenstorrent/tt-metal", "commit": "aaabbb.."} be ok?

and then we get the repo url for every repo/commit as:

url = f"https://github.com/{repo}/commit/{sha}"

for example for nightly it would be a list of dicts such as:

commits: |
  [
    {"repo": "tenstorrent/tt-metal", "commit": "${{ needs.resolve-shas.outputs.tt-metal-sha }}"},
    {"repo": "tenstorrent/tt-inference-server", "commit": "${{ needs.resolve-shas.outputs.inference-server-sha }}"},
    {"repo": "tenstorrent/vllm", "commit": "${{ needs.resolve-shas.outputs.vllm-sha }}"}
  ]

If I have misunderstood, please let me know.

Edit:
I just saw the recent changes on tt-shield.
So, I guess what you would like would be to have the full repository url in the "repo" field? And if there is no url (e.g. ubuntu-version), include this as well as a static field?

Copilot AI review requested due to automatic review settings June 3, 2026 11:34

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Comment on lines +163 to +190
def _normalized_commit_sha(value: str | None) -> str | None:
"""Strip whitespace and validate SHA format (hex, 7–40 chars). Returns None if invalid."""
if not value:
return None
normalized = value.strip()
if re.fullmatch(r"[0-9a-fA-F]{7,40}", normalized):
return normalized
return None


def _commit_version_line(commits: list[dict]) -> str:
"""Render a ' · '-joined line of repo commit links from a list of {repo, commit} dicts.

Each entry must have 'repo' (e.g. 'tenstorrent/tt-metal') and 'commit' (SHA).
The display label is the part of the repo name after '/'.
Invalid or missing SHAs are silently skipped.
Returns empty string when nothing is renderable.
"""
parts = []
for entry in commits:
repo = (entry.get("repo") or "").strip()
sha = _normalized_commit_sha(entry.get("commit"))
if not repo or not sha:
continue
label = repo.split("/")[-1] if "/" in repo else repo
short = sha[:7]
url = f"https://github.com/{repo}/commit/{sha}"
parts.append(f"**{label}**: [`{short}`]({url})")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parts = []
for entry in commits:
    if not isinstance(entry, dict):
        continue
    repo = str(entry.get("repo") or "").strip()
    sha = _normalized_commit_sha(str(entry.get("commit") or ""))
    if not repo or not sha:
        continue
    label = repo.split("/")[-1] if "/" in repo else repo
    short = sha[:7]
    url = f"https://github.com/{repo}/commit/{sha}"
    parts.append(f"**{label}**: [`{short}`]({url})")
return " \u00b7 ".join(parts)

Comment on lines +371 to +383
if stats.successful_jobs:
labeled_success = sorted(
((_extract_run_label(j) or "\u2014"), j) for j in stats.successful_jobs if j.status == "SUCCESS"
)
if labeled_success:
md += f"<details>\n<summary>Successful Models ({len(labeled_success)})</summary>\n\n"
md += "| Job | Run | Status |\n"
md += "|-----|-----|--------|\n"
for model, job in labeled_success:
job_cell = _job_id_cell(job, run_url)
emoji = STATUS_EMOJI.get(job.status, "")
md += f"| {job_cell} | {model} | {emoji} {job.status} |\n"
md += "\n</details>\n\n"
Comment on lines +338 to +340
stats = _stats(jobs=jobs)
stats.successful_jobs = [j for j in jobs if j.status == "SUCCESS"]
report = format_run_report(stats)
Comment on lines 262 to +263
# -----------------------------------------------------------------------
# 4. Job Status Overview -- sorted by severity, with visual bar
# 5. Job Status Overview -- sorted by severity, with visual bar
ppetrovicTT
ppetrovicTT previously approved these changes Jun 3, 2026

@ppetrovicTT ppetrovicTT left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! 🙌

  1. Can you please add commits input to run/README.md?
  2. Please run a workflow to make sure it works? I'd love to see how it's integrated.

Thanks!

Copilot AI review requested due to automatic review settings June 3, 2026 13:31
@ipastalTT

ipastalTT commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

Hi @ppetrovicTT. I attach 2 reports (truncated) below, one with all 3 JSON arrays passed and one with only for tt-metal. I also included "commits" on the README.md file.

You can find the runs here as well:

So for this to take effect, for example on nightly, we will have to pass this on tt-shield, I currently run it from my own branch for quick testing. I can create 2 issues on tt-shield and self assign myself for that and for uploading the report on s3.

Truncated Output with all 3 commit shas

AI Run Summary

Run: 26882137988 · Date: 2026-06-03

tt-metal: 7463e58 · tt-inference-server: 754ab5c · vllm: 3334377

Critically unhealthy run with 74% failure rate (57/77 jobs) spanning crashes, test failures, and infrastructure issues across nearly all hardware configurations.

Job Status Overview

Status Count Distribution
🟣 INFRA_FAILURE 8 ██░░░░░░░░░░░░░░░░░░ 10%
🔴 CRASHED 34 █████████░░░░░░░░░░░ 44%
🔴 TIMEOUT 1 ░░░░░░░░░░░░░░░░░░░░ 1%
🟠 TESTS_FAILED 11 ███░░░░░░░░░░░░░░░░░ 14%
🟡 EVALS_BELOW_TARGET 3 █░░░░░░░░░░░░░░░░░░░ 4%
🟢 SUCCESS 20 █████░░░░░░░░░░░░░░░ 26%

Failure Category Distribution

Category Jobs Distribution Subcategories
tt-metal 21 ████░░░░░░░░ 37% fabric 9 · trace 4 · dispatch 3 · memory 3 · device 2
vllm 9 ██░░░░░░░░░░ 16% engine 7 · config 2
infra 8 ██░░░░░░░░░░ 14% no_logs 8
app 8 ██░░░░░░░░░░ 14% api 6 · server 2
model 6 █░░░░░░░░░░░ 11% accuracy 5 · load 1
hw 3 █░░░░░░░░░░░ 5% fabric 3
runtime 2 ░░░░░░░░░░░░ 4% exception 2

Dominant Failure Pattern

TT-Metal subsystem failures (fabric topology mapping, trace buffer sizing, DRAM exhaustion, and dispatch timeouts) combined with vLLM engine startup failures (missing _processor_factory attribute, empty model name assertions) are causing cascading server unavailability and downstream eval/benchmark timeouts.

Truncated output with tt-metal only

AI Run Summary

Run: 26887478767 · Date: 2026-06-03
tt-metal: f4796b7

Critically unhealthy run with 74% failure rate (57/77 jobs) spanning crashes, test failures, and infrastructure issues across multiple hardware configurations.

Job Status Overview

Status Count Distribution
🟣 INFRA_FAILURE 8 ██░░░░░░░░░░░░░░░░░░ 10%
🔴 CRASHED 34 █████████░░░░░░░░░░░ 44%
🔴 TIMEOUT 1 ░░░░░░░░░░░░░░░░░░░░ 1%
🟠 TESTS_FAILED 11 ███░░░░░░░░░░░░░░░░░ 14%
🟡 EVALS_BELOW_TARGET 3 █░░░░░░░░░░░░░░░░░░░ 4%
🟢 SUCCESS 20 █████░░░░░░░░░░░░░░░ 26%

Failure Category Distribution

Category Jobs Distribution Subcategories
tt-metal 21 ████░░░░░░░░ 37% fabric 9 · trace 4 · dispatch 3 · memory 3 · device 2
vllm 9 ██░░░░░░░░░░ 16% engine 7 · config 2
infra 8 ██░░░░░░░░░░ 14% no_logs 8
app 8 ██░░░░░░░░░░ 14% api 6 · server 2
model 6 █░░░░░░░░░░░ 11% accuracy 5 · load 1
hw 3 █░░░░░░░░░░░ 5% fabric 3
runtime 2 ░░░░░░░░░░░░ 4% exception 2

Dominant Failure Pattern

TT-Metal infrastructure instability — fabric topology mapping failures, trace buffer size mismatches, and DRAM exhaustion — combined with a vLLM multimodal registry bug (missing _processor_factory) are causing the majority of crashes across diverse models and hardware targets.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Comment on lines +373 to +376
if stats.successful_jobs:
labeled_success = sorted(
((_extract_run_label(j) or "\u2014"), j) for j in stats.successful_jobs if j.status == "SUCCESS"
)
Comment on lines +185 to +192
repo = str(entry.get("repo") or "").strip()
sha = _normalized_commit_sha(str(entry.get("commit") or ""))
if not repo or not sha:
continue
label = repo.split("/")[-1] if "/" in repo else repo
short = sha[:7]
url = f"https://github.com/{repo}/commit/{sha}"
parts.append(f"**{label}**: [`{short}`]({url})")
return None


def _commit_version_line(commits: list[dict]) -> str:
@ppetrovicTT ppetrovicTT merged commit 666ba11 into tenstorrent:main Jun 17, 2026
2 checks passed
@ppetrovicTT

Copy link
Copy Markdown
Contributor

@ipastalTT i merged this. you can change the caller in tt-shield!

@ipastalTT

Copy link
Copy Markdown
Contributor Author

Thanks @ppetrovicTT !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants