update: ai-run-summary#123
Conversation
There was a problem hiding this comment.
Pull request overview
Updates the ai-run-summary tool to enrich run reports with component commit SHAs and per-model expandable details by ingesting markdown sidecar artifacts.
Changes:
- Add support for loading a
*.md“sidecar” summary alongside eachai_job_summary_*.jsonartifact and rendering it in the run report. - Add commit SHA reporting for TT-Metal / tt-inference-server / vLLM (explicit CLI/action inputs with optional auto-fetch via
gh). - Replace the previous failed-job details table with an alphabetical, expandable “Model Details” section for all jobs.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| .github/actions/ai_summary/tool/ai_run_summary/tests/test_parse.py | Adds coverage for loading/omitting the markdown sidecar summary. |
| .github/actions/ai_summary/tool/ai_run_summary/tests/test_format.py | Updates formatting expectations for commit SHAs + new “Model Details” output. |
| .github/actions/ai_summary/tool/ai_run_summary/tests/test_commits.py | Adds tests for resolving commit SHAs from gh job logs. |
| .github/actions/ai_summary/tool/ai_run_summary/parse.py | Loads *.md sidecar content into parsed summaries. |
| .github/actions/ai_summary/tool/ai_run_summary/models.py | Extends ParsedJobSummary with a markdown field. |
| .github/actions/ai_summary/tool/ai_run_summary/format.py | Adds commit SHA header rendering and new expandable per-model details blocks. |
| .github/actions/ai_summary/tool/ai_run_summary/commits.py | Implements gh-based resolve-shas job discovery + SHA extraction. |
| .github/actions/ai_summary/tool/ai_run_summary/cli.py | Wires commit SHA inputs/auto-fetch and passes all summaries into the formatter. |
| .github/actions/ai_summary/run/action.yml | Adds action inputs/env wiring for explicit component commit SHAs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| md_path = file_path.with_suffix(".md") | ||
| try: | ||
| markdown = md_path.read_text() if md_path.exists() else "" | ||
| except OSError: |
| emoji = STATUS_EMOJI.get(job.status, "") | ||
| model = _extract_run_label(job) or "\u2014" | ||
| url = _job_url(job, run_url) | ||
| label = job.job_id or job.source_file.stem.removeprefix("ai_job_summary_") | ||
| job_link = f'<a href="{url}">{label}</a>' if url else label | ||
|
|
||
| parts = [f"<strong>{model}</strong>", f"{emoji} {job.status}", job_link] | ||
|
|
||
| if not compact: | ||
| category = job.category or "UNKNOWN" | ||
| parts.append(f"<code>{category}</code>") | ||
|
|
||
| if show_root_cause: | ||
| root_cause = job.root_cause or "" | ||
| if len(root_cause) > _ROOT_CAUSE_COL_MAX: | ||
| root_cause = root_cause[:_ROOT_CAUSE_COL_MAX] + "\u2026" | ||
| parts.append(root_cause) | ||
|
|
||
| summary_line = " | ".join(parts) |
| import markdown | ||
|
|
||
| from .models import ParsedJobSummary, RunNarrative, RunStats, STATUS_EMOJI | ||
| from .models import NON_FAILURE_STATUSES, ParsedJobSummary, RunNarrative, RunStats, STATUS_EMOJI |
| vllm_commit: Optional vLLM commit SHA. | ||
| inference_server_commit: Optional tt-inference-server commit SHA. | ||
| all_summaries: All parsed job summaries (success + failure). When provided, | ||
| a Model Details section is appended after the failed job details. |
| if not count_output.strip() or int(count_output.strip() or "0") < 100: | ||
| break |
| # Extract "Full sha: <40-char-hex>" lines in order of appearance. | ||
| # The resolve-shas workflow resolves: tt-metal, inference-server, vllm — in that order. | ||
| shas: list[str] = [] | ||
| for line in logs.splitlines(): | ||
| if "Full sha:" in line: | ||
| parts = line.split("Full sha:") | ||
| if len(parts) == 2: | ||
| sha = parts[1].strip() | ||
| if sha and all(c in "0123456789abcdefABCDEF" for c in sha): | ||
| shas.append(sha) | ||
|
|
| print(f"Fetching commit SHAs for run {run_id_for_commits}...", file=sys.stderr) | ||
| repo = os.environ.get("GITHUB_REPOSITORY", "tenstorrent/tt-shield") | ||
| commits = fetch_run_commits(int(run_id_for_commits), repo=repo) |
| env: | ||
| TT_CHAT_API_KEY: ${{ inputs.api-key }} | ||
| TT_CHAT_URL: ${{ inputs.api-url }} | ||
| CONFIG: ${{ inputs.config }} | ||
| EXPECTED_JOBS: ${{ inputs.expected-jobs }} | ||
| RUN_RESULT: ${{ inputs.run-result }} | ||
| TT_METAL_COMMIT: ${{ inputs.tt-metal-commit }} | ||
| VLLM_COMMIT: ${{ inputs.vllm-commit }} | ||
| INFERENCE_SERVER_COMMIT: ${{ inputs.inference-server-commit }} | ||
| run: | |
ppetrovicTT
left a comment
There was a problem hiding this comment.
Hi @ipastalTT ! I'm glad you like the tool and want to improve it!
Several points:
-
Let's not invoke gh cli on the currently ongoing run. It is prone to race conditions, can be hard to debug, and can introduce bugs. The workflow should provide all the data to action.
-
The failed jobs were sorted in two-pass: First by severity - infra, crash, failed tests, failed evals. And then by category. That way it is easier to spot issues that span across different models - again, as these are more impactful, they should be easier to spot and act upon.
-
I like your idea to have collapsable descriptions within this summary. But we have 1MB size limit, so I decided to keep them separate with only the root cause for easier comparison. Either way, the ai-run-summary is using .json file created by ai-job-summary. If you need only a certain field, let's make it separate in .json, so that it's easier extracted? Rather than stripping the .md files.
Thanks!
|
Hi @ppetrovicTT. Thanks for your feedback!
|
Keeping the summaries in S3 should not be this tool's job.
Yes, let's please avoid this - we hope for the number of models to grow. In tt-shield nightly, currenty only the models with inference server are supported. We need to add the summary for media-server as well. Since we already keep the summaries as separate files, let's just refer to them? In the TT Models dashboard, I believe we can re-link to proper jobs. We have the job IDs in the name, should be easy to find. A few more things:
Thanks! |
…ranch with new commit SHA inputs\n - Pass tt-metal, vllm, and inference-server SHAs to the action
- Restore the Failed Job Details table removed in e5d7d24 - Remove _job_expandable_block helper and Model Details <details> block - Remove all_summaries parameter from format_run_report - Delete commits.py and its tests (gh CLI auto-fetch of SHAs) - Drop --run-id arg and RunCommits/fetch_run_commits from cli.py - Pass SHA args directly from CLI args to format_run_report - Restore test_format.py to origin/main baseline + add SHA-header test Co-authored-by: Cursor <cursoragent@cursor.com>
c042547 to
39f4fb6
Compare
- Drop markdown field from ParsedJobSummary (no consumer left) - Remove .md sidecar file reading from parse_json_summary - Remove sidecar tests from test_parse.py - Fix long line in test_format.py SHA-header test
dcbdffa to
1d74e04
Compare
- Add successful_jobs field to RunStats (populated in aggregate.py) - Render collapsed Successful Models block in format.py sorted alphabetically with job links - Add tests in test_aggregate.py and test_format.py
|
Hi @ppetrovicTT , I reverted most of the changes. the commit SHAs can be provided like this in tt-shield nightly.yaml. ai-run-summary:
name: "AI Run Summary"
needs: [resolve-shas, call-dynamic-workflow]
.
.
.
tt-metal-commit: ${{ needs.resolve-shas.outputs.tt-metal-sha }}
vllm-commit: ${{ needs.resolve-shas.outputs.vllm-sha }}
inference-server-commit: ${{ needs.resolve-shas.outputs.inference-server-sha }}
Thanks! |
| ) | ||
| md += f"<details>\n<summary>Successful Models ({len(sorted_success)})</summary>\n\n" | ||
| for job in sorted_success: | ||
| label = _extract_run_label(job) or "\u2014" |
There was a problem hiding this comment.
can we please have the same table here as for the failed jobs above?
without the category and root cause of course.
Cool for the SHAs.
ai-run-summary already supports |
| stats.status_counts[job.status] = stats.status_counts.get(job.status, 0) + 1 | ||
|
|
||
| stats.failed_jobs = [j for j in summaries if j.status not in NON_FAILURE_STATUSES] | ||
| stats.successful_jobs = [j for j in summaries if j.status in NON_FAILURE_STATUSES] |
There was a problem hiding this comment.
addressed in new commit
| versions = [] | ||
| if tt_metal_commit: | ||
| short = tt_metal_commit[:7] | ||
| url = f"https://github.com/tenstorrent/tt-metal/commit/{tt_metal_commit}" | ||
| versions.append(f"**TT-Metal**: [`{short}`]({url})") | ||
| if inference_server_commit: | ||
| short = inference_server_commit[:7] | ||
| url = f"https://github.com/tenstorrent/tt-inference-server/commit/{inference_server_commit}" | ||
| versions.append(f"**tt-inference-server**: [`{short}`]({url})") | ||
| if vllm_commit: | ||
| short = vllm_commit[:7] | ||
| url = f"https://github.com/tenstorrent/vllm/commit/{vllm_commit}" |
There was a problem hiding this comment.
addressed in new commit
| sorted_success = sorted( | ||
| stats.successful_jobs, | ||
| key=lambda j: (_extract_run_label(j) or "").lower(), | ||
| ) | ||
| md += f"<details>\n<summary>Successful Models ({len(sorted_success)})</summary>\n\n" | ||
| md += "| Job | Run | Status |\n" | ||
| md += "|-----|-----|--------|\n" | ||
| for job in sorted_success: | ||
| job_cell = _job_id_cell(job, run_url) | ||
| model = _extract_run_label(job) or "\u2014" |
There was a problem hiding this comment.
addressed in new commit
| stats = compute_stats(jobs) | ||
| assert len(stats.successful_jobs) == 2 | ||
| assert all(j.status == "SUCCESS" for j in stats.successful_jobs) | ||
|
|
|
Just getting back to this, sorry. @ipastalTT , do you think we can make this interface a bit more generic? so if it's only tt-metal, we print that.. if there are more we print them all.. but don't hardcode to tt-metal, vllm, tt-inference-server. otherwise, good to go! |
|
Hi @ppetrovicTT , great advice! Would a json structure such as {"repo": "tenstorrent/tt-metal", "commit": "aaabbb.."} be ok? and then we get the repo url for every repo/commit as: url = f"https://github.com/{repo}/commit/{sha}"for example for nightly it would be a list of dicts such as: commits: |
[
{"repo": "tenstorrent/tt-metal", "commit": "${{ needs.resolve-shas.outputs.tt-metal-sha }}"},
{"repo": "tenstorrent/tt-inference-server", "commit": "${{ needs.resolve-shas.outputs.inference-server-sha }}"},
{"repo": "tenstorrent/vllm", "commit": "${{ needs.resolve-shas.outputs.vllm-sha }}"}
]If I have misunderstood, please let me know. Edit: |
| def _normalized_commit_sha(value: str | None) -> str | None: | ||
| """Strip whitespace and validate SHA format (hex, 7–40 chars). Returns None if invalid.""" | ||
| if not value: | ||
| return None | ||
| normalized = value.strip() | ||
| if re.fullmatch(r"[0-9a-fA-F]{7,40}", normalized): | ||
| return normalized | ||
| return None | ||
|
|
||
|
|
||
| def _commit_version_line(commits: list[dict]) -> str: | ||
| """Render a ' · '-joined line of repo commit links from a list of {repo, commit} dicts. | ||
|
|
||
| Each entry must have 'repo' (e.g. 'tenstorrent/tt-metal') and 'commit' (SHA). | ||
| The display label is the part of the repo name after '/'. | ||
| Invalid or missing SHAs are silently skipped. | ||
| Returns empty string when nothing is renderable. | ||
| """ | ||
| parts = [] | ||
| for entry in commits: | ||
| repo = (entry.get("repo") or "").strip() | ||
| sha = _normalized_commit_sha(entry.get("commit")) | ||
| if not repo or not sha: | ||
| continue | ||
| label = repo.split("/")[-1] if "/" in repo else repo | ||
| short = sha[:7] | ||
| url = f"https://github.com/{repo}/commit/{sha}" | ||
| parts.append(f"**{label}**: [`{short}`]({url})") |
There was a problem hiding this comment.
parts = []
for entry in commits:
if not isinstance(entry, dict):
continue
repo = str(entry.get("repo") or "").strip()
sha = _normalized_commit_sha(str(entry.get("commit") or ""))
if not repo or not sha:
continue
label = repo.split("/")[-1] if "/" in repo else repo
short = sha[:7]
url = f"https://github.com/{repo}/commit/{sha}"
parts.append(f"**{label}**: [`{short}`]({url})")
return " \u00b7 ".join(parts)| if stats.successful_jobs: | ||
| labeled_success = sorted( | ||
| ((_extract_run_label(j) or "\u2014"), j) for j in stats.successful_jobs if j.status == "SUCCESS" | ||
| ) | ||
| if labeled_success: | ||
| md += f"<details>\n<summary>Successful Models ({len(labeled_success)})</summary>\n\n" | ||
| md += "| Job | Run | Status |\n" | ||
| md += "|-----|-----|--------|\n" | ||
| for model, job in labeled_success: | ||
| job_cell = _job_id_cell(job, run_url) | ||
| emoji = STATUS_EMOJI.get(job.status, "") | ||
| md += f"| {job_cell} | {model} | {emoji} {job.status} |\n" | ||
| md += "\n</details>\n\n" |
| stats = _stats(jobs=jobs) | ||
| stats.successful_jobs = [j for j in jobs if j.status == "SUCCESS"] | ||
| report = format_run_report(stats) |
| # ----------------------------------------------------------------------- | ||
| # 4. Job Status Overview -- sorted by severity, with visual bar | ||
| # 5. Job Status Overview -- sorted by severity, with visual bar |
ppetrovicTT
left a comment
There was a problem hiding this comment.
Awesome! 🙌
- Can you please add
commitsinput torun/README.md? - Please run a workflow to make sure it works? I'd love to see how it's integrated.
Thanks!
|
Hi @ppetrovicTT. I attach 2 reports (truncated) below, one with all 3 JSON arrays passed and one with only for tt-metal. I also included "commits" on the README.md file. You can find the runs here as well: So for this to take effect, for example on nightly, we will have to pass this on tt-shield, I currently run it from my own branch for quick testing. I can create 2 issues on tt-shield and self assign myself for that and for uploading the report on s3. Truncated Output with all 3 commit shasAI Run SummaryRun: 26882137988 · Date: 2026-06-03 tt-metal:
Job Status Overview
Failure Category Distribution
Dominant Failure PatternTT-Metal subsystem failures (fabric topology mapping, trace buffer sizing, DRAM exhaustion, and dispatch timeouts) combined with vLLM engine startup failures (missing Truncated output with tt-metal onlyAI Run SummaryRun: 26887478767 · Date: 2026-06-03
Job Status Overview
Failure Category Distribution
Dominant Failure PatternTT-Metal infrastructure instability — fabric topology mapping failures, trace buffer size mismatches, and DRAM exhaustion — combined with a vLLM multimodal registry bug (missing |
| if stats.successful_jobs: | ||
| labeled_success = sorted( | ||
| ((_extract_run_label(j) or "\u2014"), j) for j in stats.successful_jobs if j.status == "SUCCESS" | ||
| ) |
| repo = str(entry.get("repo") or "").strip() | ||
| sha = _normalized_commit_sha(str(entry.get("commit") or "")) | ||
| if not repo or not sha: | ||
| continue | ||
| label = repo.split("/")[-1] if "/" in repo else repo | ||
| short = sha[:7] | ||
| url = f"https://github.com/{repo}/commit/{sha}" | ||
| parts.append(f"**{label}**: [`{short}`]({url})") |
| return None | ||
|
|
||
|
|
||
| def _commit_version_line(commits: list[dict]) -> str: |
|
@ipastalTT i merged this. you can change the caller in tt-shield! |
|
Thanks @ppetrovicTT ! |
Scope
update ai-run-summary module
What Changed
Testing
Run on nightly at tt-shield from this branch with a specified run id (skips building images, downloads artifacts from that run, re-uploads and runs ai-run-summary)
New Look
AI Run Summary
Run: 26156616304 · Date: 2026-05-20
TT-Metal:
c5beb19· tt-inference-server:18ec154· vLLM:bfce692Job Status Overview
██░░░░░░░░░░░░░░░░░░10%█████████░░░░░░░░░░░44%░░░░░░░░░░░░░░░░░░░░1%███░░░░░░░░░░░░░░░░░14%█░░░░░░░░░░░░░░░░░░░4%█████░░░░░░░░░░░░░░░26%Failure Category Distribution
tt-metal████░░░░░░░░37%vllm██░░░░░░░░░░16%infra██░░░░░░░░░░14%app██░░░░░░░░░░14%model█░░░░░░░░░░░11%hw█░░░░░░░░░░░5%runtime░░░░░░░░░░░░4%Dominant Failure Pattern
TT-Metal subsystem failures (fabric topology mapping, trace buffer sizing, DRAM exhaustion, and dispatch timeouts) combined with vLLM engine startup failures (missing
_processor_factoryattribute, empty model name assertions) are causing cascading server unavailability and downstream eval/benchmark timeouts.Failed Job Details (57)
infra:no_logsinfra:no_logsinfra:no_logsinfra:no_logsinfra:no_logsinfra:no_logsinfra:no_logsinfra:no_logsapp:apiapp:serverhw:fabrichw:fabrichw:fabricmodel:loadruntime:exceptionruntime:exceptiontt-metal:devicett-metal:devicett-metal:dispatchtt-metal:dispatchtt-metal:dispatchtt-metal:fabrictt-metal:fabrictt-metal:fabrictt-metal:fabrictt-metal:fabrictt-metal:fabrictt-metal:fabrictt-metal:fabrictt-metal:fabrictt-metal:memorytt-metal:memorytt-metal:memorytt-metal:tracett-metal:tracett-metal:tracett-metal:tracevllm:configvllm:enginevllm:enginevllm:enginevllm:enginevllm:engineapp:apiapp:apiapp:api/tt-livenessendpoin…app:api/tt-livenessendpoint is returning HTTP 405 (Method Not Allowed) …app:apiapp:serverBGEM3Runnerclass is missing the_run_asyncmethod, causing all embedding requests…model:accuracymodel:accuracymodel:accuracymodel:accuracyvllm:configmodel:accuracyvllm:enginevllm:engineSuccessful Models (20)
Run Summary Stats
anthropic/claude-sonnet-4-6