update: ai-run-summary by ipastalTT · Pull Request #123 · tenstorrent/tt-github-actions

ipastalTT · 2026-05-06T08:45:23Z

Scope

update ai-run-summary module

What Changed

Included optional commit SHAs for tt-metal, tt-inference-server and vLLM.
Added succesful jobs tab.

Testing

Run on nightly at tt-shield from this branch with a specified run id (skips building images, downloads artifacts from that run, re-uploads and runs ai-run-summary)

New Look

AI Run Summary

Run: 26156616304 · Date: 2026-05-20

TT-Metal: c5beb19 · tt-inference-server: 18ec154 · vLLM: bfce692

Critically unhealthy run with 74% failure rate (57/77 jobs) spanning crashes, test failures, and infrastructure issues across nearly all hardware configurations.

Job Status Overview

Status	Count	Distribution
🟣 INFRA_FAILURE	8	`██░░░░░░░░░░░░░░░░░░` 10%
🔴 CRASHED	34	`█████████░░░░░░░░░░░` 44%
🔴 TIMEOUT	1	`░░░░░░░░░░░░░░░░░░░░` 1%
🟠 TESTS_FAILED	11	`███░░░░░░░░░░░░░░░░░` 14%
🟡 EVALS_BELOW_TARGET	3	`█░░░░░░░░░░░░░░░░░░░` 4%
🟢 SUCCESS	20	`█████░░░░░░░░░░░░░░░` 26%

Failure Category Distribution

Category	Jobs	Distribution	Subcategories
`tt-metal`	21	`████░░░░░░░░` 37%	fabric 9 · trace 4 · dispatch 3 · memory 3 · device 2
`vllm`	9	`██░░░░░░░░░░` 16%	engine 7 · config 2
`infra`	8	`██░░░░░░░░░░` 14%	no_logs 8
`app`	8	`██░░░░░░░░░░` 14%	api 6 · server 2
`model`	6	`█░░░░░░░░░░░` 11%	accuracy 5 · load 1
`hw`	3	`█░░░░░░░░░░░` 5%	fabric 3
`runtime`	2	`░░░░░░░░░░░░` 4%	exception 2

Dominant Failure Pattern

TT-Metal subsystem failures (fabric topology mapping, trace buffer sizing, DRAM exhaustion, and dispatch timeouts) combined with vLLM engine startup failures (missing _processor_factory attribute, empty model name assertions) are causing cascading server unavailability and downstream eval/benchmark timeouts.

Failed Job Details (57)

Job	Run	Status	Category	Root Cause
76637859441	Wan2.2-I2V-A14B-Diffusers (galaxy)	🟣 INFRA_FAILURE	`infra:no_logs`	Missing log dirs: ['tt-inference-server/workflow_logs/run_logs', 'tt-inference-server/work…
76637859444	Motif-Image-6B-Preview (galaxy)	🟣 INFRA_FAILURE	`infra:no_logs`	Missing log dirs: ['tt-inference-server/workflow_logs/run_logs', 'tt-inference-server/work…
76637859522	Wan2.2-T2V-A14B-Diffusers (galaxy)	🟣 INFRA_FAILURE	`infra:no_logs`	Missing log dirs: ['tt-inference-server/workflow_logs/run_logs', 'tt-inference-server/work…
76637859526	bge-large-en-v1.5 (galaxy)	🟣 INFRA_FAILURE	`infra:no_logs`	Missing log dirs: ['tt-inference-server/workflow_logs/run_logs', 'tt-inference-server/work…
76637859552	bge-m3 (galaxy)	🟣 INFRA_FAILURE	`infra:no_logs`	Missing log dirs: ['tt-inference-server/workflow_logs/run_logs', 'tt-inference-server/work…
76637859558	whisper-large-v3 (galaxy)	🟣 INFRA_FAILURE	`infra:no_logs`	Missing log dirs: ['tt-inference-server/workflow_logs/run_logs', 'tt-inference-server/work…
76637859586	stable-diffusion-xl-base-1.0 (galaxy)	🟣 INFRA_FAILURE	`infra:no_logs`	Missing log dirs: ['tt-inference-server/workflow_logs/run_logs', 'tt-inference-server/work…
76637859591	stable-diffusion-3.5-large (galaxy)	🟣 INFRA_FAILURE	`infra:no_logs`	Missing log dirs: ['tt-inference-server/workflow_logs/run_logs', 'tt-inference-server/work…
76637859567	mochi-1-preview (t3k)	🔴 TIMEOUT	`app:api`	The /tt-liveness endpoint on the inference server is returning HTTP 405 (Method Not Allowe…
76637859587	whisper-large-v3 (galaxy)	🔴 CRASHED	`app:server`	The inference server's /tt-liveness endpoint is returning HTTP 405 Method Not Allowed in r…
76636001501	Llama-3.1-8B-Instruct (t3k)	🔴 CRASHED	`hw:fabric`	During benchmark run 8/17 (isl-2048, osl-128, max-concurrency=32, 128 prompts), the Engine…
76636001557	Llama-3.2-3B-Instruct (t3k)	🔴 CRASHED	`hw:fabric`	During benchmark run 12/17 (ISL=8192, max-concurrency=15), the EngineCore_DP0 process cras…
76636001596	Qwen3-32B (t3k)	🔴 CRASHED	`hw:fabric`	The EngineCore (EngineCore_DP0) crashed with a fatal RuntimeError 'Timeout waiting for Eth…
76637859432	Motif-Image-6B-Preview (p300x2)	🔴 CRASHED	`model:load`	The Motif pipeline creation failed with a KeyError because the mesh device shape (2, 2) is…
76636001502	Llama-3.1-8B-Instruct (galaxy)	🔴 CRASHED	`runtime:exception`	EngineCore_DP0 crashed with an IndexError in `sampling_module.seed_manager.apply_slot_rema…
76637859576	stable-diffusion-xl-base-1.0 (galaxy)	🔴 CRASHED	`runtime:exception`	The SDXL model warmup failed with AssertionError because TT_METAL_CORE_GRID_OVERRIDE_TODEP…
76637859450	Wan2.2-I2V-A14B-Diffusers (galaxy)	🔴 CRASHED	`tt-metal:device`	The inference server crashed during device initialization because the configured MeshShape…
76637859521	Wan2.2-T2V-A14B-Diffusers (galaxy)	🔴 CRASHED	`tt-metal:device`	The inference server crashed during MeshDevice initialization because the configured mesh …
76636001472	Llama-3.1-8B-Instruct (p100)	🔴 CRASHED	`tt-metal:dispatch`	The TT-Metal device timed out during dispatch command queue initialization while the vLLM …
76636001575	gpt-oss-20b (t3k)	🔴 CRASHED	`tt-metal:dispatch`	The TT-Metal dispatch layer timed out during model warmup/prefill in the EngineCore_DP0 pr…
76636001618	gpt-oss-120b (t3k)	🔴 CRASHED	`tt-metal:dispatch`	During vLLM EngineCore (EngineCore_DP0) warmup/prefill of gpt-oss-120b on T3K, a dispatch …
76636001453	DeepSeek-R1-0528 (quad_galaxy)	🔴 CRASHED	`tt-metal:fabric`	Multi-host fabric topology mapping failed during MeshDevice initialization: ranks 1, 2, an…
76636001485	Llama-3.1-8B-Instruct (p150x4)	🔴 CRASHED	`tt-metal:fabric`	The TT-Metal fabric topology mapper failed to map the logical mesh graph to the discovered…
76636001572	Llama-3.3-70B-Instruct (p150x4)	🔴 CRASHED	`tt-metal:fabric`	The vLLM EngineCore (EngineCore_DP0) crashed during MeshDevice initialization because the …
76636001587	Llama-3.3-70B-Instruct (galaxy)	🔴 CRASHED	`tt-metal:fabric`	The TT-Metal fabric topology mapper failed to map the logical mesh graph to the physical t…
76636001630	Qwen3-32B (galaxy)	🔴 CRASHED	`tt-metal:fabric`	The TT-Metal fabric topology mapper failed to map the logical mesh graph to the physical t…
76637859448	FLUX.1-dev (p150x4)	🔴 CRASHED	`tt-metal:fabric`	TT_FATAL in fabric topology mapper: the logical mesh graph (MGD) with 1 node could not be …
76637859453	FLUX.1-schnell (p150x4)	🔴 CRASHED	`tt-metal:fabric`	TT_FATAL in fabric topology mapper: the logical mesh graph (MGD) could not be mapped to th…
76637859559	mochi-1-preview (galaxy)	🔴 CRASHED	`tt-metal:fabric`	TT_FATAL in tt-metal fabric topology mapper: the logical mesh graph (MGD) could not be map…
76637859561	mochi-1-preview (p150x4)	🔴 CRASHED	`tt-metal:fabric`	TT_FATAL in tt-metal fabric topology mapper: the logical mesh graph (MGD) could not be map…
76636001522	Qwen2.5-72B-Instruct (galaxy)	🔴 CRASHED	`tt-metal:memory`	During vLLM engine core initialization (EngineCore_DP0), a TT_FATAL out-of-memory error oc…
76636001537	Qwen2.5-72B-Instruct (t3k)	🔴 CRASHED	`tt-metal:memory`	During vLLM EngineCore startup, a DRAM buffer allocation of 134217728 B (128 MB) failed in…
76636001589	Llama-3.2-3B-Instruct (n150)	🔴 CRASHED	`tt-metal:memory`	During a prefill operation with 32 concurrent requests, the TT-Metal DRAM allocator ran ou…
76636001552	Llama-3.3-70B-Instruct (t3k)	🔴 CRASHED	`tt-metal:trace`	The trace region allocated for MeshDevice 0 is only 30000000B (30MB) but the prefill trace…
76636001554	Llama-3.2-3B-Instruct (n300)	🔴 CRASHED	`tt-metal:trace`	The trace buffer required for prefill capture (53100544B ≈ 50.6MB) exceeds the allocated t…
76636001559	QwQ-32B (galaxy)	🔴 CRASHED	`tt-metal:trace`	The vLLM engine (EngineCore_DP0) crashed during model warmup because the trace buffer requ…
76636001594	Qwen3-8B (t3k)	🔴 CRASHED	`tt-metal:trace`	The trace buffer required for prefill capture (51101696B) exceeds the allocated trace regi…
76636001538	Mistral-Small-3.1-24B-Instruct-2503 (t3k)	🔴 CRASHED	`vllm:config`	The vLLM server failed to start because 'mistralai/Mistral-Small-3.1-24B-Instruct-2503' ha…
76636001459	Llama-3.2-11B-Vision-Instruct (t3k)	🔴 CRASHED	`vllm:engine`	The vLLM server crashed during startup because the Llama-3.2-11B-Vision-Instruct model cla…
76636001483	Llama-3.2-11B-Vision-Instruct (n300)	🔴 CRASHED	`vllm:engine`	The vLLM server failed to start because the Llama-3.2-11B-Vision-Instruct model class is m…
76636001525	Mistral-7B-Instruct-v0.3 (n150)	🔴 CRASHED	`vllm:engine`	The vLLM EngineCore failed to start because the model name specified in the vLLM configura…
76636001541	Mistral-7B-Instruct-v0.3 (n300)	🔴 CRASHED	`vllm:engine`	The vLLM EngineCore failed to start because an assertion in generator_vllm.py detected a m…
76636001566	Llama-3.2-90B-Vision-Instruct (t3k)	🔴 CRASHED	`vllm:engine`	The vLLM server crashed during startup because the Llama-3.2-90B-Vision-Instruct model cla…
76637859436	Motif-Image-6B-Preview (t3k)	🟠 TESTS_FAILED	`app:api`	The /tt-liveness endpoint on the inference server is returning HTTP 405 Method Not Allowed…
76637859451	Motif-Image-6B-Preview (p150x8)	🟠 TESTS_FAILED	`app:api`	The /tt-liveness endpoint on the inference server is returning HTTP 405 Method Not Allowed…
76637859463	FLUX.1-schnell (t3k)	🟠 TESTS_FAILED	`app:api`	The inference server is running and reachable on port 8000, but the `/tt-liveness` endpoin…
76637859532	Wan2.2-T2V-A14B-Diffusers (p150x8)	🟠 TESTS_FAILED	`app:api`	The inference server's `/tt-liveness` endpoint is returning HTTP 405 (Method Not Allowed) …
76637859581	stable-diffusion-3.5-large (t3k)	🟠 TESTS_FAILED	`app:api`	The inference server's /tt-liveness endpoint is returning HTTP 405 (Method Not Allowed) fo…
76637859506	bge-m3 (t3k)	🟠 TESTS_FAILED	`app:server`	The `BGEM3Runner` class is missing the `_run_async` method, causing all embedding requests…
76636001492	Llama-3.1-8B-Instruct (p300x2)	🟠 TESTS_FAILED	`model:accuracy`	The vLLM server running Llama-3.1-8B-Instruct on p300x2 is not honoring the seed parameter…
76636001619	Qwen3-32B (p300x2)	🟠 TESTS_FAILED	`model:accuracy`	The model Qwen3-32B produces non-deterministic outputs when top_k=1 is set, with two ident…
76636001636	Llama-3.3-70B-Instruct (p150x8)	🟠 TESTS_FAILED	`model:accuracy`	The presence_penalty parameter failed to increase token diversity in the 'repeat_trap' tes…
76636001672	Llama-3.3-70B-Instruct (p300x2)	🟠 TESTS_FAILED	`model:accuracy`	The presence_penalty parameter (value 1.2) had no measurable effect on output length for t…
76636001623	Qwen2.5-VL-7B-Instruct (n300)	🟠 TESTS_FAILED	`vllm:config`	The vLLM server for Qwen2.5-VL-7B-Instruct is configured to accept at most 1 image per pro…
76636001547	Llama-3.2-1B-Instruct (t3k)	🟡 EVALS_BELOW_TARGET	`model:accuracy`	Llama-3.2-1B-Instruct on t3k failed acceptance criteria due to eval accuracy scores signif…
76636001634	Qwen2.5-VL-3B-Instruct (n300)	🟡 EVALS_BELOW_TARGET	`vllm:engine`	The vLLM server enforces a limit of at most 1 image per prompt for Qwen2.5-VL-3B-Instruct,…
76636001638	Qwen2.5-VL-3B-Instruct (n150)	🟡 EVALS_BELOW_TARGET	`vllm:engine`	The vLLM server enforces a limit of at most 1 image per prompt for Qwen2.5-VL-3B-Instruct,…

Successful Models (20)

Job	Run	Status
76637859434	FLUX.1-dev (p150x8)	🟢 SUCCESS
76637859437	FLUX.1-dev (p300x2)	🟢 SUCCESS
76637859455	FLUX.1-dev (t3k)	🟢 SUCCESS
76637859461	FLUX.1-schnell (p150x8)	🟢 SUCCESS
76637859433	FLUX.1-schnell (p300x2)	🟢 SUCCESS
76637859512	mochi-1-preview (galaxy)	🟢 SUCCESS
76637859569	mochi-1-preview (p150x8)	🟢 SUCCESS
76637859583	mochi-1-preview (p300x2)	🟢 SUCCESS
76636001579	Qwen2.5-VL-32B-Instruct (t3k)	🟢 SUCCESS
76636001610	Qwen2.5-VL-72B-Instruct (t3k)	🟢 SUCCESS
76636001613	Qwen3-VL-32B-Instruct (t3k)	🟢 SUCCESS
76637859566	stable-diffusion-xl-base-1.0 (p150x8)	🟢 SUCCESS
76637859575	stable-diffusion-xl-base-1.0 (p300x2)	🟢 SUCCESS
76637859509	Wan2.2-I2V-A14B-Diffusers (p150x4)	🟢 SUCCESS
76637859503	Wan2.2-I2V-A14B-Diffusers (p150x8)	🟢 SUCCESS
76637859596	Wan2.2-I2V-A14B-Diffusers (p300x2)	🟢 SUCCESS
76637859467	Wan2.2-I2V-A14B-Diffusers (t3k)	🟢 SUCCESS
76637859553	Wan2.2-T2V-A14B-Diffusers (p150x4)	🟢 SUCCESS
76637859536	Wan2.2-T2V-A14B-Diffusers (p300x2)	🟢 SUCCESS
76637859488	Wan2.2-T2V-A14B-Diffusers (t3k)	🟢 SUCCESS

Run Summary Stats

Metric	Value
Total jobs	77
Failed	57
Passed	20
LLM model	`anthropic/claude-sonnet-4-6`
Tokens	3676 + 198
LLM time	6807ms

Copilot

Pull request overview

Updates the ai-run-summary tool to enrich run reports with component commit SHAs and per-model expandable details by ingesting markdown sidecar artifacts.

Changes:

Add support for loading a *.md “sidecar” summary alongside each ai_job_summary_*.json artifact and rendering it in the run report.
Add commit SHA reporting for TT-Metal / tt-inference-server / vLLM (explicit CLI/action inputs with optional auto-fetch via gh).
Replace the previous failed-job details table with an alphabetical, expandable “Model Details” section for all jobs.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
.github/actions/ai_summary/tool/ai_run_summary/tests/test_parse.py	Adds coverage for loading/omitting the markdown sidecar summary.
.github/actions/ai_summary/tool/ai_run_summary/tests/test_format.py	Updates formatting expectations for commit SHAs + new “Model Details” output.
.github/actions/ai_summary/tool/ai_run_summary/tests/test_commits.py	Adds tests for resolving commit SHAs from `gh` job logs.
.github/actions/ai_summary/tool/ai_run_summary/parse.py	Loads `*.md` sidecar content into parsed summaries.
.github/actions/ai_summary/tool/ai_run_summary/models.py	Extends `ParsedJobSummary` with a `markdown` field.
.github/actions/ai_summary/tool/ai_run_summary/format.py	Adds commit SHA header rendering and new expandable per-model details blocks.
.github/actions/ai_summary/tool/ai_run_summary/commits.py	Implements `gh`-based resolve-shas job discovery + SHA extraction.
.github/actions/ai_summary/tool/ai_run_summary/cli.py	Wires commit SHA inputs/auto-fetch and passes all summaries into the formatter.
.github/actions/ai_summary/run/action.yml	Adds action inputs/env wiring for explicit component commit SHAs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    md_path = file_path.with_suffix(".md")
+    try:
+        markdown = md_path.read_text() if md_path.exists() else ""
+    except OSError:


+    emoji = STATUS_EMOJI.get(job.status, "")
+    model = _extract_run_label(job) or "\u2014"
+    url = _job_url(job, run_url)
+    label = job.job_id or job.source_file.stem.removeprefix("ai_job_summary_")
+    job_link = f'<a href="{url}">{label}</a>' if url else label
+
+    parts = [f"<strong>{model}</strong>", f"{emoji} {job.status}", job_link]
+
+    if not compact:
+        category = job.category or "UNKNOWN"
+        parts.append(f"<code>{category}</code>")
+
+        if show_root_cause:
+            root_cause = job.root_cause or ""
+            if len(root_cause) > _ROOT_CAUSE_COL_MAX:
+                root_cause = root_cause[:_ROOT_CAUSE_COL_MAX] + "\u2026"
+            parts.append(root_cause)
+
+    summary_line = " &nbsp;|&nbsp; ".join(parts)


 import markdown

-from .models import ParsedJobSummary, RunNarrative, RunStats, STATUS_EMOJI
+from .models import NON_FAILURE_STATUSES, ParsedJobSummary, RunNarrative, RunStats, STATUS_EMOJI


+        vllm_commit: Optional vLLM commit SHA.
+        inference_server_commit: Optional tt-inference-server commit SHA.
+        all_summaries: All parsed job summaries (success + failure). When provided,
+            a Model Details section is appended after the failed job details.


+        if not count_output.strip() or int(count_output.strip() or "0") < 100:
+            break


+    # Extract "Full sha: <40-char-hex>" lines in order of appearance.
+    # The resolve-shas workflow resolves: tt-metal, inference-server, vllm — in that order.
+    shas: list[str] = []
+    for line in logs.splitlines():
+        if "Full sha:" in line:
+            parts = line.split("Full sha:")
+            if len(parts) == 2:
+                sha = parts[1].strip()
+                if sha and all(c in "0123456789abcdefABCDEF" for c in sha):
+                    shas.append(sha)
+


+        print(f"Fetching commit SHAs for run {run_id_for_commits}...", file=sys.stderr)
+        repo = os.environ.get("GITHUB_REPOSITORY", "tenstorrent/tt-shield")
+        commits = fetch_run_commits(int(run_id_for_commits), repo=repo)


      env:
        TT_CHAT_API_KEY: ${{ inputs.api-key }}
        TT_CHAT_URL: ${{ inputs.api-url }}
        CONFIG: ${{ inputs.config }}
        EXPECTED_JOBS: ${{ inputs.expected-jobs }}
        RUN_RESULT: ${{ inputs.run-result }}
+        TT_METAL_COMMIT: ${{ inputs.tt-metal-commit }}
+        VLLM_COMMIT: ${{ inputs.vllm-commit }}
+        INFERENCE_SERVER_COMMIT: ${{ inputs.inference-server-commit }}
      run: |


ppetrovicTT

Hi @ipastalTT ! I'm glad you like the tool and want to improve it!

Several points:

Let's not invoke gh cli on the currently ongoing run. It is prone to race conditions, can be hard to debug, and can introduce bugs. The workflow should provide all the data to action.
The failed jobs were sorted in two-pass: First by severity - infra, crash, failed tests, failed evals. And then by category. That way it is easier to spot issues that span across different models - again, as these are more impactful, they should be easier to spot and act upon.
I like your idea to have collapsable descriptions within this summary. But we have 1MB size limit, so I decided to keep them separate with only the root cause for easier comparison. Either way, the ai-run-summary is using .json file created by ai-job-summary. If you need only a certain field, let's make it separate in .json, so that it's easier extracted? Rather than stripping the .md files.

Thanks!

ipastalTT · 2026-05-18T12:32:25Z

Hi @ppetrovicTT. Thanks for your feedback!

Regarding your first point gh api is used as a fallback in case the commit SHAs are not supplied to the workflow. I will remove the fallback logic.
I see, that makes sense. The reason I refactored it that way was that I thought it might be easier for the user to search their model. But of course this can be reverted. By the way we would like this report to be accessible here, so the user can quickly find the latest nightly report for their model. Do we need to upload it somewhere specific or is it already accessible there? Asking because I am not that familiar with the whole codebase, but assumed since it can be accessed through gh api, it could be accessed in that webpage backend as well.
On your third point, I think you are correct to mention using only the json file, since I believe the information is already present there. We can then render it the same way, without needing the md artifacts. Do you think we might exceed the 1MB limit if we provide more context ? Right now I see between 140-200KB for the generated files. I can understand that more verbosity and adding new models might increase its footprint. This change was made in relation to the previous point, instead of the user navigating on the run url to see more details, they can see what went wrong for various models from the same markdown document. What do you think on that, should this be reverted?

ppetrovicTT · 2026-05-18T18:00:31Z

I see, that makes sense. The reason I refactored it that way was that I thought it might be easier for the user to search their model. But of course this can be reverted. By the way we would like this report to be accessible here, so the user can quickly find the latest nightly report for their model. Do we need to upload it somewhere specific or is it already accessible there? Asking because I am not that familiar with the whole codebase, but assumed since it can be accessed through gh api, it could be accessed in that webpage backend as well.

Keeping the summaries in S3 should not be this tool's job.
tt-shield has a separate workflow that pushes them to S3.
@acvejicTT can help with that.

On your third point, I think you are correct to mention using only the json file, since I believe the information is already present there. We can then render it the same way, without needing the md artifacts. Do you think we might exceed the 1MB limit if we provide more context ? Right now I see between 140-200KB for the generated files. I can understand that more verbosity and adding new models might increase its footprint. This change was made in relation to the previous point, instead of the user navigating on the run url to see more details, they can see what went wrong for various models from the same markdown document. What do you think on that, should this be reverted?

Yes, let's please avoid this - we hope for the number of models to grow. In tt-shield nightly, currenty only the models with inference server are supported. We need to add the summary for media-server as well. Since we already keep the summaries as separate files, let's just refer to them? In the TT Models dashboard, I believe we can re-link to proper jobs. We have the job IDs in the name, should be easy to find.

A few more things:

Please keep in mind that this same action is now used in both tt-shield and tt-metal.
The usecases are quite different (one repo vs cross-repo, different logs structures, folders, etc.)
It's very easy to expand the action API and add stuff. While developing the early version, I already had 2 passes of cleaning it! If we can be smart about fetching the commit SHAs, but not going through the API, that would be great.

Thanks!

…ranch with new commit SHA inputs\n - Pass tt-metal, vllm, and inference-server SHAs to the action

- Restore the Failed Job Details table removed in e5d7d24 - Remove _job_expandable_block helper and Model Details <details> block - Remove all_summaries parameter from format_run_report - Delete commits.py and its tests (gh CLI auto-fetch of SHAs) - Drop --run-id arg and RunCommits/fetch_run_commits from cli.py - Pass SHA args directly from CLI args to format_run_report - Restore test_format.py to origin/main baseline + add SHA-header test Co-authored-by: Cursor <cursoragent@cursor.com>

- Drop markdown field from ParsedJobSummary (no consumer left) - Remove .md sidecar file reading from parse_json_summary - Remove sidecar tests from test_parse.py - Fix long line in test_format.py SHA-header test

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

- Add successful_jobs field to RunStats (populated in aggregate.py) - Render collapsed Successful Models block in format.py sorted alphabetically with job links - Add tests in test_aggregate.py and test_format.py

ipastalTT · 2026-05-20T10:36:56Z

Hi @ppetrovicTT , I reverted most of the changes.
Now the commit SHAs are optional, they won't be included if the args are not provided.
I added successful models as well in a different tab, I don't know if you agree with that.
Do you think we should try to find which models do not have ai job summaries and mark them as missing or something? Might help the user.

the commit SHAs can be provided like this in tt-shield nightly.yaml.
here

  ai-run-summary:
    name: "AI Run Summary"
    needs: [resolve-shas, call-dynamic-workflow]
.
.
.
          tt-metal-commit: ${{ needs.resolve-shas.outputs.tt-metal-sha }}
          vllm-commit: ${{ needs.resolve-shas.outputs.vllm-sha }}
          inference-server-commit: ${{ needs.resolve-shas.outputs.inference-server-sha }}

Thanks!

ppetrovicTT · 2026-05-20T16:08:51Z

+        )
+        md += f"<details>\n<summary>Successful Models ({len(sorted_success)})</summary>\n\n"
+        for job in sorted_success:
+            label = _extract_run_label(job) or "\u2014"


can we please have the same table here as for the failed jobs above?
without the category and root cause of course.

ppetrovicTT · 2026-05-20T16:19:04Z

Hi @ppetrovicTT , I reverted most of the changes. Now the commit SHAs are optional, they won't be included if the args are not provided. I added successful models as well in a different tab, I don't know if you agree with that.

Cool for the SHAs.
I agree with the list of successful jobs - they're visible in Github, but not if we look at reports elsewhere (dashboard).

Do you think we should try to find which models do not have ai job summaries and mark them as missing or something? Might help the user.

ai-run-summary already supports expected-jobs input - link. Sometimes we need some gymnastics to gather all the jobs that were intended to run (in tt-metal there are often filters, nested jobs, etc.) but Claude often helps there. It's not implemented in Models CI though, should be added in the workflow call (not in the action).

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

ipastalTT · 2026-05-22T12:28:52Z

        stats.status_counts[job.status] = stats.status_counts.get(job.status, 0) + 1

    stats.failed_jobs = [j for j in summaries if j.status not in NON_FAILURE_STATUSES]
+    stats.successful_jobs = [j for j in summaries if j.status in NON_FAILURE_STATUSES]


addressed in new commit

ipastalTT · 2026-05-22T12:29:27Z

+    versions = []
+    if tt_metal_commit:
+        short = tt_metal_commit[:7]
+        url = f"https://github.com/tenstorrent/tt-metal/commit/{tt_metal_commit}"
+        versions.append(f"**TT-Metal**: [`{short}`]({url})")
+    if inference_server_commit:
+        short = inference_server_commit[:7]
+        url = f"https://github.com/tenstorrent/tt-inference-server/commit/{inference_server_commit}"
+        versions.append(f"**tt-inference-server**: [`{short}`]({url})")
+    if vllm_commit:
+        short = vllm_commit[:7]
+        url = f"https://github.com/tenstorrent/vllm/commit/{vllm_commit}"


addressed in new commit

ipastalTT · 2026-05-22T12:29:07Z

+        sorted_success = sorted(
+            stats.successful_jobs,
+            key=lambda j: (_extract_run_label(j) or "").lower(),
+        )
+        md += f"<details>\n<summary>Successful Models ({len(sorted_success)})</summary>\n\n"
+        md += "| Job | Run | Status |\n"
+        md += "|-----|-----|--------|\n"
+        for job in sorted_success:
+            job_cell = _job_id_cell(job, run_url)
+            model = _extract_run_label(job) or "\u2014"


addressed in new commit

+        stats = compute_stats(jobs)
+        assert len(stats.successful_jobs) == 2
+        assert all(j.status == "SUCCESS" for j in stats.successful_jobs)
+


ppetrovicTT · 2026-06-03T09:12:21Z

Just getting back to this, sorry.

@ipastalTT , do you think we can make this interface a bit more generic?
for example, provide a json {"repo", "commit"}, and auto-fill in the fields?
the field can be called just "commits".

so if it's only tt-metal, we print that.. if there are more we print them all.. but don't hardcode to tt-metal, vllm, tt-inference-server.

otherwise, good to go!
thanks!

ipastalTT · 2026-06-03T09:34:25Z

Hi @ppetrovicTT , great advice!

Would a json structure such as {"repo": "tenstorrent/tt-metal", "commit": "aaabbb.."} be ok?

and then we get the repo url for every repo/commit as:

url = f"https://github.com/{repo}/commit/{sha}"

for example for nightly it would be a list of dicts such as:

commits: |
  [
    {"repo": "tenstorrent/tt-metal", "commit": "${{ needs.resolve-shas.outputs.tt-metal-sha }}"},
    {"repo": "tenstorrent/tt-inference-server", "commit": "${{ needs.resolve-shas.outputs.inference-server-sha }}"},
    {"repo": "tenstorrent/vllm", "commit": "${{ needs.resolve-shas.outputs.vllm-sha }}"}
  ]

If I have misunderstood, please let me know.

Edit:
I just saw the recent changes on tt-shield.
So, I guess what you would like would be to have the full repository url in the "repo" field? And if there is no url (e.g. ubuntu-version), include this as well as a static field?

…ach repository

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

ipastalTT · 2026-06-03T11:44:31Z

+def _normalized_commit_sha(value: str | None) -> str | None:
+    """Strip whitespace and validate SHA format (hex, 7–40 chars). Returns None if invalid."""
+    if not value:
+        return None
+    normalized = value.strip()
+    if re.fullmatch(r"[0-9a-fA-F]{7,40}", normalized):
+        return normalized
+    return None
+
+
+def _commit_version_line(commits: list[dict]) -> str:
+    """Render a ' · '-joined line of repo commit links from a list of {repo, commit} dicts.
+
+    Each entry must have 'repo' (e.g. 'tenstorrent/tt-metal') and 'commit' (SHA).
+    The display label is the part of the repo name after '/'.
+    Invalid or missing SHAs are silently skipped.
+    Returns empty string when nothing is renderable.
+    """
+    parts = []
+    for entry in commits:
+        repo = (entry.get("repo") or "").strip()
+        sha = _normalized_commit_sha(entry.get("commit"))
+        if not repo or not sha:
+            continue
+        label = repo.split("/")[-1] if "/" in repo else repo
+        short = sha[:7]
+        url = f"https://github.com/{repo}/commit/{sha}"
+        parts.append(f"**{label}**: [`{short}`]({url})")


parts = [] for entry in commits: if not isinstance(entry, dict): continue repo = str(entry.get("repo") or "").strip() sha = _normalized_commit_sha(str(entry.get("commit") or "")) if not repo or not sha: continue label = repo.split("/")[-1] if "/" in repo else repo short = sha[:7] url = f"https://github.com/{repo}/commit/{sha}" parts.append(f"**{label}**: [`{short}`]({url})") return " \u00b7 ".join(parts)

+    if stats.successful_jobs:
+        labeled_success = sorted(
+            ((_extract_run_label(j) or "\u2014"), j) for j in stats.successful_jobs if j.status == "SUCCESS"
+        )
+        if labeled_success:
+            md += f"<details>\n<summary>Successful Models ({len(labeled_success)})</summary>\n\n"
+            md += "| Job | Run | Status |\n"
+            md += "|-----|-----|--------|\n"
+            for model, job in labeled_success:
+                job_cell = _job_id_cell(job, run_url)
+                emoji = STATUS_EMOJI.get(job.status, "")
+                md += f"| {job_cell} | {model} | {emoji} {job.status} |\n"
+            md += "\n</details>\n\n"


+        stats = _stats(jobs=jobs)
+        stats.successful_jobs = [j for j in jobs if j.status == "SUCCESS"]
+        report = format_run_report(stats)


    # -----------------------------------------------------------------------
-    # 4. Job Status Overview -- sorted by severity, with visual bar
+    # 5. Job Status Overview -- sorted by severity, with visual bar


ppetrovicTT

Awesome! 🙌

Can you please add commits input to run/README.md?
Please run a workflow to make sure it works? I'd love to see how it's integrated.

Thanks!

ipastalTT · 2026-06-03T13:31:18Z

Hi @ppetrovicTT. I attach 2 reports (truncated) below, one with all 3 JSON arrays passed and one with only for tt-metal. I also included "commits" on the README.md file.

You can find the runs here as well:

So for this to take effect, for example on nightly, we will have to pass this on tt-shield, I currently run it from my own branch for quick testing. I can create 2 issues on tt-shield and self assign myself for that and for uploading the report on s3.

Truncated Output with all 3 commit shas

AI Run Summary

Run: 26882137988 · Date: 2026-06-03

tt-metal: 7463e58 · tt-inference-server: 754ab5c · vllm: 3334377

Critically unhealthy run with 74% failure rate (57/77 jobs) spanning crashes, test failures, and infrastructure issues across nearly all hardware configurations.

Job Status Overview

Status	Count	Distribution
🟣 INFRA_FAILURE	8	`██░░░░░░░░░░░░░░░░░░` 10%
🔴 CRASHED	34	`█████████░░░░░░░░░░░` 44%
🔴 TIMEOUT	1	`░░░░░░░░░░░░░░░░░░░░` 1%
🟠 TESTS_FAILED	11	`███░░░░░░░░░░░░░░░░░` 14%
🟡 EVALS_BELOW_TARGET	3	`█░░░░░░░░░░░░░░░░░░░` 4%
🟢 SUCCESS	20	`█████░░░░░░░░░░░░░░░` 26%

Failure Category Distribution

Category	Jobs	Distribution	Subcategories
`tt-metal`	21	`████░░░░░░░░` 37%	fabric 9 · trace 4 · dispatch 3 · memory 3 · device 2
`vllm`	9	`██░░░░░░░░░░` 16%	engine 7 · config 2
`infra`	8	`██░░░░░░░░░░` 14%	no_logs 8
`app`	8	`██░░░░░░░░░░` 14%	api 6 · server 2
`model`	6	`█░░░░░░░░░░░` 11%	accuracy 5 · load 1
`hw`	3	`█░░░░░░░░░░░` 5%	fabric 3
`runtime`	2	`░░░░░░░░░░░░` 4%	exception 2

Dominant Failure Pattern

TT-Metal subsystem failures (fabric topology mapping, trace buffer sizing, DRAM exhaustion, and dispatch timeouts) combined with vLLM engine startup failures (missing _processor_factory attribute, empty model name assertions) are causing cascading server unavailability and downstream eval/benchmark timeouts.

Truncated output with tt-metal only

AI Run Summary

Run: 26887478767 · Date: 2026-06-03
tt-metal: f4796b7

Critically unhealthy run with 74% failure rate (57/77 jobs) spanning crashes, test failures, and infrastructure issues across multiple hardware configurations.

Job Status Overview

Status	Count	Distribution
🟣 INFRA_FAILURE	8	`██░░░░░░░░░░░░░░░░░░` 10%
🔴 CRASHED	34	`█████████░░░░░░░░░░░` 44%
🔴 TIMEOUT	1	`░░░░░░░░░░░░░░░░░░░░` 1%
🟠 TESTS_FAILED	11	`███░░░░░░░░░░░░░░░░░` 14%
🟡 EVALS_BELOW_TARGET	3	`█░░░░░░░░░░░░░░░░░░░` 4%
🟢 SUCCESS	20	`█████░░░░░░░░░░░░░░░` 26%

Failure Category Distribution

Category	Jobs	Distribution	Subcategories
`tt-metal`	21	`████░░░░░░░░` 37%	fabric 9 · trace 4 · dispatch 3 · memory 3 · device 2
`vllm`	9	`██░░░░░░░░░░` 16%	engine 7 · config 2
`infra`	8	`██░░░░░░░░░░` 14%	no_logs 8
`app`	8	`██░░░░░░░░░░` 14%	api 6 · server 2
`model`	6	`█░░░░░░░░░░░` 11%	accuracy 5 · load 1
`hw`	3	`█░░░░░░░░░░░` 5%	fabric 3
`runtime`	2	`░░░░░░░░░░░░` 4%	exception 2

Dominant Failure Pattern

TT-Metal infrastructure instability — fabric topology mapping failures, trace buffer size mismatches, and DRAM exhaustion — combined with a vLLM multimodal registry bug (missing _processor_factory) are causing the majority of crashes across diverse models and hardware targets.

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

+    if stats.successful_jobs:
+        labeled_success = sorted(
+            ((_extract_run_label(j) or "\u2014"), j) for j in stats.successful_jobs if j.status == "SUCCESS"
+        )


+        repo = str(entry.get("repo") or "").strip()
+        sha = _normalized_commit_sha(str(entry.get("commit") or ""))
+        if not repo or not sha:
+            continue
+        label = repo.split("/")[-1] if "/" in repo else repo
+        short = sha[:7]
+        url = f"https://github.com/{repo}/commit/{sha}"
+        parts.append(f"**{label}**: [`{short}`]({url})")


+    return None
+
+
+def _commit_version_line(commits: list[dict]) -> str:


ppetrovicTT · 2026-06-17T13:07:29Z

@ipastalTT i merged this. you can change the caller in tt-shield!

ipastalTT · 2026-06-17T13:08:57Z

Thanks @ppetrovicTT !

Copilot AI review requested due to automatic review settings May 6, 2026 08:45

ipastalTT requested a review from a team as a code owner May 6, 2026 08:45

Copilot started reviewing on behalf of ipastalTT May 6, 2026 08:46 View session

Copilot AI reviewed May 6, 2026

View reviewed changes

ppetrovicTT self-requested a review May 12, 2026 10:44

ppetrovicTT requested changes May 12, 2026

View reviewed changes

ipastalTT and others added 4 commits May 20, 2026 11:44

update: ai-run-summary \n\n - Point on-nightly at tt-github-actions b…

b984ff4

…ranch with new commit SHA inputs\n - Pass tt-metal, vllm, and inference-server SHAs to the action

update: remove failed jobs details

de373b3

fix: formatting

109d1f7

ipastalTT force-pushed the ipastalTT/update-ai-run-summary branch from c042547 to 39f4fb6 Compare May 20, 2026 09:44

Copilot AI review requested due to automatic review settings May 20, 2026 09:52

Remove markdown sidecar loading from parse/models

1d74e04

- Drop markdown field from ParsedJobSummary (no consumer left) - Remove .md sidecar file reading from parse_json_summary - Remove sidecar tests from test_parse.py - Fix long line in test_format.py SHA-header test

ipastalTT force-pushed the ipastalTT/update-ai-run-summary branch from dcbdffa to 1d74e04 Compare May 20, 2026 10:00

Copilot AI reviewed May 20, 2026

feat: add Successful Models section to run report

c5c3410

- Add successful_jobs field to RunStats (populated in aggregate.py) - Render collapsed Successful Models block in format.py sorted alphabetically with job links - Add tests in test_aggregate.py and test_format.py

ipastalTT requested a review from ppetrovicTT May 20, 2026 10:34

ppetrovicTT reviewed May 20, 2026

View reviewed changes

fix: Succesful jobs table format to follow Failed jobs table format

1f2eabd

ipastalTT requested review from Copilot and ppetrovicTT May 22, 2026 12:03

Copilot AI reviewed May 22, 2026

View reviewed changes

fix: succesful jobs gathered by status, SHAs validation

ecce77f

update: Remove implicit repo shas. Pass them in a list of dicts for e…

cefa2ff

…ach repository

Copilot AI review requested due to automatic review settings June 3, 2026 11:34

Copilot AI reviewed Jun 3, 2026

View reviewed changes

fix: address copilot PR comments

ae91f7e

ppetrovicTT previously approved these changes Jun 3, 2026

View reviewed changes

update: ai_summary/run/README.md to reflect commits argument

0b7ee16

Copilot AI review requested due to automatic review settings June 3, 2026 13:31

ipastalTT dismissed ppetrovicTT’s stale review via 0b7ee16 June 3, 2026 13:31

Copilot AI reviewed Jun 3, 2026

View reviewed changes

vmilosevic approved these changes Jun 15, 2026

View reviewed changes

Merge branch 'main' into ipastalTT/update-ai-run-summary

dce40b5

ppetrovicTT merged commit 666ba11 into tenstorrent:main Jun 17, 2026
2 checks passed

		if not count_output.strip() or int(count_output.strip() or "0") < 100:
		break

		return None


		def _commit_version_line(commits: list[dict]) -> str:

Conversation

ipastalTT commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Scope

What Changed

Testing

New Look

AI Run Summary

Job Status Overview

Failure Category Distribution

Dominant Failure Pattern

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

ppetrovicTT left a comment

Choose a reason for hiding this comment

Uh oh!

ipastalTT commented May 18, 2026

Uh oh!

ppetrovicTT commented May 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

ipastalTT commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ppetrovicTT May 20, 2026

Choose a reason for hiding this comment

Uh oh!

ppetrovicTT commented May 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

ipastalTT May 22, 2026

Choose a reason for hiding this comment

Uh oh!

ipastalTT May 22, 2026

Choose a reason for hiding this comment

Uh oh!

ipastalTT May 22, 2026

Choose a reason for hiding this comment

Uh oh!

ppetrovicTT commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ipastalTT commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

ipastalTT Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

ppetrovicTT left a comment

Choose a reason for hiding this comment

Uh oh!

ipastalTT commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Truncated Output with all 3 commit shas

AI Run Summary

Job Status Overview

Failure Category Distribution

Dominant Failure Pattern

Truncated output with tt-metal only

AI Run Summary

Job Status Overview

Failure Category Distribution

Dominant Failure Pattern

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

ipastalTT commented May 6, 2026 •

edited

Loading

ipastalTT commented May 20, 2026 •

edited

Loading

ppetrovicTT commented Jun 3, 2026 •

edited

Loading

ipastalTT commented Jun 3, 2026 •

edited

Loading

ipastalTT commented Jun 3, 2026 •

edited

Loading